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Preface 


These  notes  are  meant  to  provide  an  introduction  to  wavelet  transforms.  The  first  five  sections  can  be 
accessed  by  readers  with  some  exposure  to  signal  processing  tools.  Sec.  9  -  Sec.  13  present  advanced 
material  which  is  at  the  heart  of  wavelet  research  today.  While  maintaining  a  mathematically  rigorous 
presentation  level,  we  have  attempted  to  make  the  presentation  accessible  to  signal  processors,  and  engineers 
in  general.  To  make  this  possible,  we  have  included  three  review  sections  on  mathematics  (Sec.  6  -  Sec.  8), 
especially  on  and  Fourier  transforms.  Frames,  and  Riesz  bases  in  infinite  dimensions. 

A  slightly  reduced  version  is  scheduled  to  appear  in  the  Mathematics  Section  of  the  Circuits  and 
Filters  Handbook,  to  be  published  by  CRC  Press,  Inc.,  next  year.  There  is  an  appendix  on  Distributions 
which  will  probably  be  deleted  from  the  version  to  be  published. 

Remember  that  this  is  a  handbook  chapter,  not  a  text  book  chapter.  Proofs  of  many  standard  results 
are  excluded,  but  with  proper  citations.  We  have  given  Sketches  of  Proofs  for  some  of  the  recent  wavelet 
results  (Sec.  11  -  Sec.  13,  especially)  which  brought  them  to  the  level  of  importance  they  enjoy  today. 
Comments  and  suggestions  from  you,  the  reader,  are  most  welcome.  Enjoy! 


P.  P.  Vaidyanathan  and  Igor  Djokovic 
August  1994 
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A  new  idea  is  first  attacked  as  absurd;  then  it  is  admitted  to  be  true,  but  obvious; 
finally  it  is  seen  to  be  so  important,  that  its  adversaries  claim  that 
they  themselves  discovered  it  —  William  James 


1.  INTRODUCTION 

Transform  techniques  such  as  the  Fourier  and  Laplace  transforms,  and  the  z-transform  have  been  used  in  a 
wide  variety  of  scientific  and  engineering  disciplines  for  a  long  time  [Van  Valkenburg,  1960],  [Papoulis,  1962], 
[Oppenheim,  et  al.,  1983].  In  a  number  of  applications  where  we  require  a  joint  time-frequency  picture,  it 
is  necessary  to  consider  other  types  of  transforms  or  time-frequency  representations.  Many  such  methods 
have  evolved.  In  particular  the  wavelet  transform  technique  (Grossman  and  Morlet,  1984],  [Meyer,  1986], 
[Daubechies,  1992],  has  some  unique  advantages  over  other  kinds  of  time-frequency  representations  such  as 
the  short-time  Fourier  transform.  For  historical  developments  as  well  as  many  technical  details  and  original 
material  see  [Daubechies,  1992].  In  this  chapter  we  will  describe  some  of  these  representations,  and  explain 
the  advantages  of  the  wavelet  transform,  and  the  reason  for  its  recent  popularity. 

A  subclass  of  wavelet  transforms  [Daubechies,  1988],  has  an  intimate  connection  with  the  theory  of 
digital  filter  banks.  Filter  banks  have  been  known  to  the  signal  processing  community  for  over  two  decades, 
see  [Vaidyanathan,  1993]  and  references  therein,  especially  [Croisier  et  al.,  1976],  [Crochiere  and  Rabiner, 
1983],  [Vetterli,  1987],  [Akansu  and  Haddad,  1992]  and  [Malvar,  1992].  It  is  this  relation  that  makes  it  possible 
to  construct  in  a  systematic  way  a  wide  family  of  wavelets  with  several  desirable  properties  such  as  compact 
support  (i.e.,  finite  duration),  smoothness,  good  time-frequency  localization,  and  basis-orthonormality  (all 
these  terms  will  be  explained  later). 

The  connection  between  wavelets  and  filter  banks  finds  beautiful  mathematical  expression  in  the  theory 
of  multiresolution  [Mallat,  1989a].  This  enables  us  to  compute  the  wavelet  transform  coefficients  using  the  so 
called  Fast  Wavelet  Transform  (FWT),  which  is  essentially  a  tree  structured  filter  bank.  In  addition  to  the 
practical  value,  many  deep  results  from  several  disciplines  find  a  unified  home  in  the  theory  and  development 

Work  supported  in  parts  by  Office  of  Naval  Research  grant  N00014-93- 1-0231,  Rockwell  International, 
and  Tektronix,  Inc. 
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of  the  wavelet  transform.  This  includes  signal  processing,  circuit  theory,  communications,  and  mathematics. 
Our  emphasis  here  will  be  this  unification,  and  the  beautiful  big  picture  that  it  provides.  Other  tutorials 
on  wavelets  with  different  choices  of  emphasis  can  be  found  in  Heil  and  Walnut  [1989],  Rioul  and  Vetterli 
[1991],  Vetterli  and  Herley  [1992],  Strang  [1993],  Vaidyanathan  [1993]  and  Gopinath  and  Burrus  [1993]. 

Scope  and  Outline 

The  literature  on  wavelets  is  enormous,  and  an  attempt  to  do  justice  to  everything  would  prove  to  be 
futile.  Even  a  list  of  references  that  is  fair  to  all  contributors  would  be  too  long.  We  therefore  restrict 
discussions  to  really  basic,  core  material.  Sections  1—5  give  an  overview,  with  the  presentation  given  at  a 
level  that  can  be  comprehended  by  most  engineers.  The  more  advanced  results  on  wavelets,  which  really 
brought  them  great  attention  in  recent  years,  are  presented  in  the  last  five  sections  Sec.  9-13.  At  the  heart 
of  these  results  lie  several  powerful  mathematical  tools,  which  are  usually  not  familiar  to  engineers.  We  have 
therefore  presented  a  fairly  extensive  math  review  in  three  sections  (Sec.  6-8).  We  suggest  that  the  reader 
go  through  this  review  material  once  and  then  use  it  primarily  as  a  reference. 

The  advanced  sections  9-13  are  organized  such  that  the  main  points,  summarized  as  Theorems  for  con¬ 
venience  of  reference,  can  be  appreciated  even  without  the  mathematical  background  material  in  Sec.  6-8. 
The  mathematical  sections  do,  however,  facilitate  a  deeper  understanding.  It  is  our  hope  that  these  sections 
will  bring  most  readers  to  a  point  where  they  can  pursue  wavelet  literature  without  difficulty. 

Why  Wavelets? 

A  commonly  asked  question  is  “why  wavelets?”,  that  is,  “what  are  the  advantages  offered  by  wavelets 
over  other  types  of  transform  techniques  such  as,  for  example,  the  Fourier  transform?”  The  answer  to  this 
question  is  fairly  sophisticated,  and  also  depends  on  the  level  at  which  we  address  the  question.  Several 
discussions  addressing  this  question  are  scattered  throughout  this  chapter.  A  convenient  listing  of  the 
locations  of  these  discussions  is  given  in  the  concluding  section  (Sec.  14)  under  “Why  Wavelets?” 


General  Notations  and  Acronyms 

1.  Bold  faced  quantities  represent  matrices  and  vectors. 

2.  The  notations  A^,  A*  and  A^  represent,  respectively,  the  transpose,  conjugate,  and  transpose-conjugate 
of  the  matrix  A. 

3.  The  accent  ‘tilde’  is  defined  as  follows:  H(z)  =  Ht(l/z*);  thus  if  H(2)  =  Z]nh(«)2  "  then  H(s)  = 

ht {-n)z~’^.  On  the  unit  circle  H(2)  =  Ht(2). 

4.  Acronyms.  BIBO  (Bounded-Input  Bounded-Output);  FIR  (Finite  Impulse  Response);  HR  (Infinite 
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Impulse  Response);  LTI  (Linear  Time  Invariant);  PR  (Perfect  Reconstruction);  STFT  (Short-Time 
Fourier  Transform);  WT  (Wavelet  Transform). 

5.  For  LTI  systems,  “stability”  stands  for  BIBO  stability. 

6.  6{n)  denotes  the  unit  pulse  or  discrete-time  impulse,  defined  such  that  5(0)  =  1  and  6{n)  =  0  otherwise. 
This  should  be  distinguished  from  the  Dirac  delta  function  [Oppenheim,  et  ah,  1983],  which  is  denoted 

as  6a{t). 

7.  Figures.  Sampled  versions  of  continuous  time  signcJs  are  indicated  with  an  arrow  on  the  top  (e.g.,  Fig. 
2.10  (a)).  The  sampled  versions  are  impulse  trains  of  the  form  c{n)6a{t  —  n),  and  are  functions  of 
continuous  t. 

2.  SIGNAL  REPRESENTATION  USING  BASIS  FUNCTIONS 

The  electrical  engineer  is  very  familiar  with  the  Fourier  transform  (FT)  and  its  role  in  the  study  of  linear 
time  invariant  (LTI)  systems  or  filters.  For  example  the  frequency  response  of  an  LTI  system  is  the  FT  of 
its  impulse  response.  The  FT  is  also  used  routinely  in  the  design  and  analysis  of  circuits.  As  a  reminder,  the 
Fourier  transform  of  a  signal  x{t)  is  given  by  the  familiar  integral  X{u;)  =  x[t)e~^‘^'^dt  and  the  inverse 

transform  byt 

x{t)  =  -5-  /  X{Lj)e^‘^^duj  (2.1) 

27r  7-00 

From  this  equation  we  can  say  that  x{t)  has  been  expressed  as  a  linear  superposition  (or  linear  combination) 
of  an  infinite  number  of  functions  Since  the  frequency  w  is  a  continuous  variable,  there  are 

uncountably  many  functions  gi.){t)  to  be  superimposed.  Electrical  engineers,  in  particular  signal  processors 
and  commnications  engineers  are  also  familiar  with  two  special  classes  of  signals  which  can  be  regarded  as 
a  superposition  of  countably  many  functions.  That  is, 

oo 

n=:— OO 

where  an  are  scedars  (possibly  complex)  nniquely  determined  by  x{t).  These  two  examples  are  (i)  time-limited 
signals  for  which  we  can  find  a  Fourier  series  (FS),  and  (ii)  bandlimited  signals  which  can  be  reconstructed 
from  uniformly  spaced  samples  by  weighting  them  with  shifted  sine  functions  (see  below). 

First  consider  a  time-limited  signal  x{t)  with  duration  0  <  t  <  1  (Fig.  2.1).  Under  some  mild  conditions 
such  a  signal  can  be  represented  in  the  form  (2.2)  with  gnifi)  —  The  expression  (2.2)  is  then  the 

t  At  the  moment  it  is  not  necessary  to  worrj^  about  the  existence,  invertibility,  and  the  type  (e.g.,  or 
L^)  of  the  FT.  We  return  to  the  mathematical  subtleties  in  Sec.  6.3. 


Fouier  series  of  x{t),  and  a„  are  the  Fourier  coefficients.  (In  contrast  we  say  that  (2.1)  is  the  Fourier  integral 
of  x{t).)  The  “transform  domain”  signal  {«„}  is  a  sequence,  and  the  transform  domain  variable  is  discrete, 
namely  the  frequencies  a;„=27rra.  Since  is  periodic  in  t  with  period  one,  the  right  hand  side  of  (2.2)  is 

periodic,  and  it  represents  x{t)  only  in  0  <  t  <  1.  It  is  sometimes  convenient  to  replace  the  complex  functions 
^j27rnt  tijg  ggt  of  real  functions  1,  v'2cos(27rnf),  v^sin(27rnf),  n  >  0,  especially  in  circuit  analysis. 


Next  consider  a  bandlimited  signal  x{i)  with  Fourier  transform  X[oj)  as  demonstrated  in  Fig.  2.2. 


If  we  sample  the  signal  at  the  Nyquist  rate  2p  radians/sec  (i.e.,  sampling  period  T  =  tt/P),  then  multiple 
copies  of  the  Fourier  transform  are  generated  [Oppenheim,  et  al.,  1983],  and  we  can  recover  x{t)  from  the 
samples  by  use  of  an  ideal  lowpass  filter  F{uj)  (Fig.  2.3). 

cxjpies  created 


Fig.  2.3.  Use  of  lowpass  filter  F{u;)  to  recover  x{t)  from  its  samples. 
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The  impulse  response  of  the  filter  is  the  sine  function  f{t)  =  ^  so  that  the  reconstruction  formula  is 

.(,)=  f  .(»T)/(.  -  »r)  =  f 

n=  — oo  n— — oo 

Comparing  with  (2.2)  we  see  that  the  “transform  domain  coefficients”  q:„  can  be  regarded  as  the  samples 
x{nT),  whereas  the  functions  g„{t)  are  the  shifted  sine  functions. 

If  a  signal  is  time-limited  or  bandlimited,  we  can  therefore  express  it  as  a  countable  linear  combination 
of  a  set  of  fundamental  functions  (called  basis  functions,  in  fact  an  orthonormal  basis  see  below).  If  the 
signal  is  more  arbitrary  (i.e.,  not  limited  in  time  or  bandwidth)  can  we  still  obtain  such  a  countable  linear 
combination? 

Suppose  we  restrict  x{t)  to  be  a  finite  energj'  signal  (i.e.,  /  \x{t)\^dt  <  oo;  also  called  signals,  see 
Sec.  2.2).  Then  this  is  possible.  In  fact,  we  can  even  find  an  unusual  kind  of  basis  called  the  wavelet  basis, 
fundamentally  different  from  the  Fourier  basis.  Representation  of  x{t)  using  this  basis  has,  in  some  applica¬ 
tions,  some  advantages  over  the  Fourier  representation  or  the  short-time  (windowed)  Fourier  representation. 
Wavelet  bases  also  exist  for  many  other  classes  of  signals  but  we  will  only  consider  the  class  of  signals. 
The  most  common  kind  of  wavelet  representation  takes  the  form 


OO  CC 


The  functions  ipknit)  are  typically  (but  not  necessarily)  linearly  independent  and  form  a  basis  for  finite  energy 
signals.  The  basis  is  very  special  in  the  sense  that  all  the  functions  tpkn{t)  are  derived  from  a  single  function 
^(t)  called  the  wavelet,  by  two  operations:  dilation  (f  -+  2*^t)  and  time-shift  {t  ^  t  —  2  ^n).  The  advantage 
of  such  a  basis  is  that  it  allows  us  to  capture  the  details  of  a  signal  at  various  scales,  while  providing  a 
time-localization  information  for  these  “scales”.  Examples  in  future  sections  will  make  this  idea  clearer. 

Why  Worry  About  Signal  Representations? 

A  common  feature  of  all  the  above  discussions  is  that  we  have  taken  a  signal  x{t)  and  found  an  equivalent 
representation  in  terms  of  the  transform  domain  quantity  {a„}  in  (2.2),  or  {c^n}  in  (2.4).  If  our  only  aim  is 
to  compute  a„  from  x{t)  and  then  recompute  x{t)  from  a„,  that  would  be  a  futile  excercise.  The  motivation 
in  practice  is  that  the  transform  domain  quantities  are  better  suited  in  some  sense.  For  example  in  audio 
coding,  decomposition  of  a  signal  into  frequency  components  is  motivated  by  the  fact  that  the  human  ear 
perceives  higher  frequencies  with  less  frequency  resolution.  We  can  use  this  information.  We  can  also  code 
the  high  frequency  components  with  relatively  less  precision,  thereby  enabling  data  compression.  In  this 
way  we  can  take  into  account  perceptual  information  during  compression.  We  could  also  account  for  the 
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fact  that  the  error  allowed  by  the  human  ear  (due  to  quantization  of  frequency  components)  depends  on  the 
frequency  masking  property  of  the  ear,  and  perform  optimum  bit  allocation  for  a  given  bit  rate. 

Other  applications  of  signal  representations  using  wavelets  include  numerical  analysis,  solution  of  dif¬ 
ferential  equations,  and  many  others  [Daubechies,  1992],  [Chui,  1992b],  [Benedetto  and  Frazier,  1994]. 

The  main  point,  in  any  case,  is  that  we  typically  perform  certain  manipulations  with  the  transform 
domain  coefficients  ctn  [or  Ckn  in  (2-4)]  before  we  recombine  them  to  form  an  approximation  of  x(t)  again. 
Therefore,  we  really  only  have 

=  '^a„gn{t)  (2.5) 

n 

where  {S„}  approximates  {a„}.  This  discussion  gives  rise  to  many  questions:  how  best  to  choose  the  basis 
functions  5n(t)  for  a  given  application?  How  to  choose  the  compressed  signal  {ttn}  so  that  for  a  given  data 
rate  the  reconstruction  error  is  minimized?  What,  indeed,  is  the  best  way  to  define  the  reconstruction  error? 

These  questions  are  deep  and  complicated,  and  will  take  us  too  far  afield.  We  will  not  address  them. 
Our  goal  is  to  point  out  the  basic  advantages  (sometimes)  offered  by  the  wavelet  transform  over  other  kinds 
of  transforms  (e.g.,  the  Fourier  transform). 

2.1.  The  ideal  bandpass  wavelet 

Consider  a  bandpass  signal  x(f)  with  Fourier  transform  as  shown  in  Fig.  2.4. 


X(C0) 


-CC^  “(Ol  0  CO]  C02  CO 


Fig.  2.4.  Fourier  transform  of  a  bandpass  signal. 

Such  signals  arise  in  communication  applications.  The  bandedges  of  the  signal  are  Wi  and  uj2  (and  — wi  and 
—iJ2  on  the  negative  side,  which  is  natural  if  x{t)  is  real).  Viewed  as  a  lowpass  signal,  the  total  bandwidth 
(counting  negative  frequencies  also)  is  2u2-  But  viewed  cis  a  bandpass  signal,  the  total  bandwidth  is  only  2(3 
where  /?  =  W2  —  wj.  Does  it  mean  that  we  can  sample  it  at  the  rate  2/?  radians/sec  (which  is  the  Nyquist 
rate  for  the  lowpass  case)? 

In  the  lowpass  case,  sampling  at  Nyquist  rate  was  enough  to  ensure  that  the  copies  of  the  spectrum 
created  by  sampling  do  not  overlap  (Fig.  2.3).  In  the  bandpass  case,  we  have  two  sets  of  such  copies;  one 
created  by  positive  half  of  the  frequency  uj\  <  u>  <  U2  and  the  other  by  the  negative  half  —uJ2  <  w  <  —ijJi  . 
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This  makes  the  problem  somewhat  more  complicated.  It  can  be  shown  that  for  sampling  at  the  rate  2/3 
there  is  no  overlap  of  images  if  and  only  if  one  of  the  edges,  wi  or  a»2,  is  a  multiple  of  2/3.  This  is  called  the 
bandpass  sampling  theorem.  The  reconstruction  of  x{t)  from  the  samples  proceeds  exactly  as  in  the  lowpass 
case,  except  that  the  reconstruction  filter  F{ui)  is  now  a  bandpass  filter  (Fig.  2.5)  occupying  precisely  the 
signal  bandwidth.  The  first  part  of  the  expression  (2.3)  is  therefore  still  valid,  i.e.,  x{t)  =  x{nT)f{t-nT) 
where  T  =  iv IP  again,  but  the  sine  function  is  replaced  with  the  bandpass  impulse  response  f{t). 


Tl/P  - 


-COi 


F(cd) 
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0 


COi  CO^  CO 

Fig.  2.5.  Bandpass  filter  to  be  used  in  the  reconstruction  of 
the  bandpass  signal  from  its  samples. 


Fig.  2.6.  Splitting  a  signal  into  frequency  subbands. 

Given  a  signal  x{t),  imagine  now  that  we  have  split  its  frequency  axis  into  subbands  in  some  manner 
(Fig.  2.6).  Letting  yk{t)  denote  the  fcth  subband  signal,  we  can  write  x{t)  =  Y!,k  yk{t)-  This  can  be  visualized 
as  passing  x{t)  through  a  bank  of  filters  {Fffc(w)},  Fig.  2.7(a),  with  responses  as  in  Fig.  2.7(b).  Note  that 
each  subband  region  is  symmetric  with  respect  to  zero  frequency,  and  therefore  supports  positive  as  well 
as  negative  frequencies.  If  the  subband  region  ojk  ^  satisfies  the  bandpass  sampling  condition, 

then  the  bandpass  signal  i/fc(t)  can  be  expressed  as  a  linear  combination  of  its  samples  as  before.  Thus, 
<t)  =  Efc  yk{t)  =  Efc  En  yk{nTk)fkit  -  nr*),  where  Tk  =  n/Pk-  Here  fk{t)  is  the  impulse  response  of  the 
reconstruction  filter  (or  synthesis  filter)  Fk{i.a)  shown  in  Fig.  2.7(c).  Fig.  2.7(a)  also  shows  this  reconstruction 


schematic. 
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Fig.  2.8.  Two  possible  schemes  to  decompose  a  signal  into  frequency  bands. 

(a)  uniform  splitting,  and  (b)  octave  band  splitting. 

The  responses  shown  are  those  of  synthesis  filters. 

Fig.  2.8  shows  the  set  of  synthesis  filters  {ffc(a;)}  for  two  examples  of  frequency  splitting  arrangement, 
namely  uniform  splitting  and  nonuniform  (octave)  splitting.  We  will  see  later  that  the  uniform  splitting 
arrangement  gives  an  example  of  the  short  time  Fourier  transform  (STFT)  representation  (Sec.  3  and 
9).  In  this  section  we  are  interested  in  octave  splitting.  The  bandedges  of  the  filters  here  are  Uk  —  2*^7r 
{k  =  ...-  1,0, 1,2,...).  The  bandedges  are  such  that  yk(t)  is  a  signal  satisfying  the  bandpass  sampling 
theorem.  It  has  =  2*7r  according  to  the  notation  of  Fig.  2.7.  It  can  be  sampled  at  period  Tt  =  7r//3fc  =  2~^ 
without  aliasing,  and  we  can  reconstruct  it  from  samples  as 

OO 

yk{t)=  -  2-"n).  (2-6) 

n=  — CO 

As  k  increases,  the  bandwidths  of  the  filters  increase  so  the  sample  spacing  Tk  =  gets  finer.  Since 
yk{i)  we  see  that  x{t)  can  be  expressed  as 

OO  CO 

x{t)=  Y.  Y  yt-(2-M/t-(<  -  2-M-  (2-7) 

k=—oo  n=-oo 

Our  definition  of  the  filters  shows  that  the  frequency  responses  are  scaled  versions  of  each  other,  that  is 
Fk{uj)  =  2”*^'(2”*'w)  with  '4'(w)  cis  in  Fig.  2.9.  The  impulse  responses  are  therefore  related  as  fk{t)  —  '0(2*t), 
and  we  can  rewrite  (2.7)  as 

OO  OO 

x{t)  =  Y  yfc(2“*'«)V'(2*f  -  n)  (2.8) 

k=—oon=-oo 
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(2.9) 


We  will  write  this  cis  x{t)  =  Sn  (^kni^kn{t)  by  defining  Ckn  —  2  and 

i^kn{t)  =  2'=/V(2*t  -  n)  =  2'‘/^ip{2\t  -  2-*n)) 

Then  the  functions  tl^kn(t)  will  have  the  same  energy  J  \ipkn{t)\^dt  for  all  k,n.  From  the  analysis/synthesis 
filter  bank  point  of  view  (Fig.  2.7)  this  is  equivalent  to  making  Fffc(w)  =  Fit(w)  and  rescaling  as  shown  in 
Fig.  2.10.  With  filters  so  rescaled,  the  wavelet  coefficients  c*„  are  just  samples  of  the  outputs  of  the  analysis 
filters  Hkiix). 


CD 

Fig.  2.9.  The  fundamental  bandpass  function  that  generates  a  bandpass  wavelet. 


(a) 


Analysis  Synthesis 

bank  bank 


Fig.  2.10.  The  octave-band  splitting  scheme,  (a)  The  analysis  bank, 
samples  and  synthesis  bank,  and  (b)  the  filter  responses. 

The  function  ^(2*^t)  is  a  dilated  version  of  ip{t)  (squeezed  version  if  >  0  and  stretched  version  if  fe  <  0). 
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The  dilation  factor  2*^  is  a  power  of  two,  so  this  is  said  to  be  a  dyadic  dilation.  The  function  -  2  '‘n)) 

is  a  shifted  version  of  the  dilated  version.  Thus  we  have  expressed  x{t)  as  a.  linear  combination  of  shifted 
versions  of  (dyadic)  dilated  versions  of  a  single  function  ip{t).  The  shifts  2-’=n  are  in  integer  multiples  of  2“* 
where  k  governs  the  dilation.  For  completeness  note  that  the  impulse  response  corresponding  to  the 
function  in  Fig.  2.9  is  given  by 


rp{t)  = 


sin(7rt/2) 

Trt/2 


cos(37rt/2) 


(ideal  bandpass  wavelet). 


(2.10) 


This  is  plotted  in  Fig.  2.11. 


Fig.  2.11.  The  ideal  bandpass  wavelet. 

In  (2.8)  we  have  obtained  a  wavelet  representation  for  x{t)  (compare  with  (2.4)).  The  function  is  called 
the  ideal  bandpass  wavelet.  It  has  also  been  known  as  the  Littlewood-Paley  wavelet.  We  will  now  introduce 
some  terminology  for  convenience  and  then  return  to  more  detailed  definitions  and  discussions  of  the  wavelet 
transform. 


2.2.  spaces,  basis  functions  and  orthonormal  bases 

Most  of  our  discussions  will  be  restricted  to  the  class  of  functions  or  square  integrable  functions,  that  is, 
functions  x{t)  for  which  /  \x{t)\^dt  exists  and  has  a  finite  value.  The  norm  or  norm  of  such  functions, 
denoted  ||a:(t)||2  is  defined  as  ||a;(t)l|2  =  {J  \x{t)fdty^^ .  The  notation  L^[a,b]  stands  for  functions  that 
are  zero  outside  the  interval  a  <  t  <  b.  The  set  L'^{R)  is  the  class  of  functions  supported  on  the  real  line 
-00  <t<oo.  We  often  abbreviate  L^{R)  as  L^. 

It  turns  out  that  the  class  of  functions  forms  a  (normed)  linear  vector  space,  i.e.,  any  linear  com¬ 
bination  of  functions  in  is  still  in  £^.  In  fact  it  forms  a  special  linear  space  such  that  there  exists  a 
countable  basis.  That  is,  there  is  a  sequence  of  linearly  independent  functions  {gn{t)}  in  such  that  any 
function  x{t)  can  be  expressed  as  x{t)  =  (^n9n{t)i  lor  a  unique  set  of  {on}-  We  say  that  5„(t)  are  the 
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basis  functions.  In  fact  spaces  have  orthonormal  bases.  For  such  a  basis,  the  basis  functions  satisfy 

{gkit),gm{t))  =  (2-11) 

where  the  notation  {f{t),git))=Jf{t)g*{t)dt  denotes  the  inner  product  between  f{t)  and  g{t).  For  an  or¬ 
thonormal  basis,  the  coefficients  a„  in  the  expansion  x{t)  =  YlZ=-oc  <^ngn{t)  can  thus  be  computed  using 
the  exceptionally  simple  relation 

a„  =  {x{t),gn{t))  (2-12) 

We  have  already  seen  two  examples  of  orthonormal  basis  above.  The  first  is  the  Fourier  series  expansion  of 
a  time  limited  signal  (0  <  t  <  1).  Here  the  basis  functions  are  clearly  orthonormal  (with  integrals 

going  from  0  to  1).  The  second  example  is  the  expansion  (2.3)  of  a  bandlimited  signal;  it  can  be  shown 
that  the  shifted  versions  f{t-  nT)  of  the  sine  functions  form  an  orthonormal  basis  for  bandlimited  signals 
(integrals  going  from  -oo  to  oo). 

Orthogonal  projections.  Suppose  we  consider  a  subset  {5nii(0}  ^he  orthonormal  basis  {gnit)}- 
Let  S  denote  the  subspace  generated  by  (^n  accurate  statement  would  be  that  5  is  the  “closure  of 

the  span  of  {5n,(0}”;  see  Sec.7.2).  Consider  the  linear  combination  y{t)  =  where  the  are 

evaluated  as  above,  that  is  a„,  =  {x{t),g„,{t))  for  some  signal  ar(#).  Then  y{t)  e  S,  and  it  can  be  shown 
that  among  all  functions  in  S,  y{t)  is  the  unique  signal  closest  to  x{t)  (i.e.,  ||a:(t)  -  i/(t)ii2  is  the  smallest). 
We  say  that  y{t)  is  the  orthogonal  projection  of  x{t)  onto  the  subspace  5,  and  write 

y{t)  =  F5[a:(f)]  (2-13) 


2.3.  Wavelet  transforms 

If  a  signal  x{t)  is  in  then  its  Fourier  transform  X{uj)  exists  in  the  sense  (Sec.  6.3). 
Sec.  6.4  that  the  discussion  which  resulted  in  the  expression  (2.8)  is  applicable  for  any  signal 
(2.8)  means  that  the  signal  can  be  expressed  as  a  linear  combination  of  the  form 

CO  CO 


x{t)  = 


E  E 


c,„2‘'V(2‘l-n) 
' - - - ' 


We  will  see  in 
x{t)  in  L^.  Eq. 


(2.14) 


where  i>{t)  is  the  impulse  response  (Fig.  2.11)  of  the  bandpass  function  $(w)  in  Fig.  2.9.t  Since  the  frequency 
responses  for  two  different  values  of  k  do  not  overlap,  the  functions  rpknit)  and  tpmiit)  are  orthogonal  for 

t  The  above  equality,  and  the  convergence  of  the  summation  should  be  interpreted  in  the  sense,  see 


Sec.  6.2. 
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m  (use  Parseval’s  relation).  For  a  given  &,  the  functions  ipknit)  ^re  shifted  versions  of  the  impulse 
responses  of  the  bandpass  filter  Fk{oj).  From  the  ideal  nature  of  this  bandpass  filter  we  can  show  that  ipknit) 
and  rpimit)  are  also  orthonormal  for  n  ^  m.  Thus,  the  set  of  functions  {i^knit)}  with  k  and  n  ranging  over  all 
integers  forms  an  orthonormal  basis  for  the  class  of  functions.  That  is,  any  function  can  be  expressed 
as  in  (2.14)  and  furthermore 

{il^kn{t),  -Ipmiit))  =  6{k  -  Tn)6{n  -  i).  (2.15) 

Because  of  this  orthonormality,  the  coefficients  Ckn  are  computed  very  easily  as 

/OO 

x(t)2'--/V(2"i-n)dt.  (2.16) 

-OO 

Defining 

r){t)  =  (2.17) 

this  takes  the  form 

/CO 

(2.18) 

•OO 

resembling  a  convolution. 

Wavelet  transform  definitions.  A  set  of  basis  functions  ipknit)  derived  from  a  single  function  ip{t) 
by  dilations  and  shifts  of  the  form 

i>kn{t)  =  2’‘l^^{2H-n)  (2.19) 

is  said  to  be  a  wavelet  basis,  and  ipit)  is  called  the  wavelet  function.  The  coefficients  Ckn  are  the  wavelet 
transform  coefficients.  The  formula  (2.16)  which  performs  the  transformation  from  x{t)  to  Ckn  is  the  wavelet 
transform  of  the  signal  x{t).  Eqn.  (2.14)  is  the  wavelet  representation  or  the  inverse  wavelet  transform. 
While  this  is  only  a  special  Ccise  of  more  general  wavelet  decompositions  outlined  in  Sec.  2.7,  it  is  perhaps 
the  most  popular  and  useful  one. 

Note  that  the  A;th  dilated  version  if{2^t)  has  the  shifted  versions  ip{2'‘t  -  n)  =  ^(2*(f  -  2“*n)),  so  the 
amount  of  shift  is  in  integer  multiples  of  2“*.  Thus  the  stretched  versions  are  shifted  by  larger  amounts  and 
squeezed  versions  by  smaller  amounts.  Even  though  we  developed  these  ideas  based  on  an  example,  the 
above  definitions  still  hold  generally  for  any  orthonormal  wavelet  basis.  For  the  ideal  bandpass  wavelet,  the 
function  xfit)  is  real  and  symmetric  (see  (2.10))  so  that  7?(t)  =  tfit).  For  more  general  orthonormal  wavelets 
we  have  the  relation  r]{t)  =  V'*(-0-  We  say  that  r]{t)  is  the  analyzing  wavelet  (because  of  (2.18))  and  'tp{t)  the 
synthesis  wavelet  (because  of  (2.14)).  For  the  non  orthonormal  case  we  still  have  the  transform  and  inverse 
transform  equations  as  above,  but  the  relation  between  and  T]{t)  are  not  as  simple  as  Tj{t)  'tp  (~0' 
We  will  not  discuss  this. 
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Before  exploring  the  properties  and  usefulness  of  wavelets  let  us  turn  to  a  distinctly  different  example. 
This  will  show  that,  unlike  the  Fourier  basis  functions  the  wavelet  basis  functions  can  be  designed 

by  the  user.  This  makes  them  more  flexible,  interesting  and  useful. 

2.4.  The  Haar  wavelet  basis 

As  early  as  1910  an  orthonormal  basis  for  functions  has  been  found  [Haar,  1910],  which  satisfies  the 
definition  of  a  wavelet  basis  given  above!  That  is,  the  basis  functions  ipkn  {t)  s^re  derived  from  a  single  function 
^(0  using  dilations  and  shifts  as  in  (2.19).  To  explain  this  system  first  consider  a  signal  x{t)  e  T^[0,  Ij.  The 
Haar  basis  is  built  from  two  functions  called  (j)(t)  and  ip{t),  as  described  in  Fig.  2.12.  The  basis  function  4>{t) 
is  a  constant  in  [0, 1].  The  basis  function  rp{t)  is  constant  on  each  half  interval,  and  its  integral  is  zero.  After 
this,  the  remaining  basis  functions  are  obtained  from  V’CO  dilations  and  shifts  as  indicated.  It  is  clear 
from  the  figure  that  any  two  of  these  functions  are  mutually  orthogonal.  We  have  an  orthonormal  set,  and 
it  can  be  shown  that  this  set  of  functions  is  an  orthonormal  basis  for  X^[0, 1].  However,  this  is  not  exactly  a 
wavelet  basis  yet,  because  of  the  presence  of  (pit).^ 

If  we  eliminate  the  requirement  that  x{t)  be  supported  or  defined  only  on  [0, 1]  and  consider  L  [R) 
functions  then  we  can  still  obtain  an  orthonormal  beisis  of  the  above  form  by  including  the  shifted  versions 
{'^(2*t  —  n)}  for  all  integer  values  of  n,  and  also  including  the  shifted  versions  {<p{t  —  n)}.  An  alternative  to 
the  use  of  {4>{t  -  n)}  would  be  to  use  stretched  (i.e.,  ip(2H),k  <  0)  as  well  as  squeezed  (i.e.,  ^(2*t),fc  >  0) 
versions  of  ^{t).  The  set  of  functions  can  thus  be  written  as  in  (2.19),  which  has  the  form  of  a  wavelet  basis. 
It  can  be  shown  that  this  forms  an  orthonormal  basis  for  L^{R).  The  Fourier  transform  of  the  Haar  wavelet 
is  given  by 

^(o;)  =  (Haar  wavelet).  (2.20) 

^  '  uj/A 

The  Haar  wavelet  has  limited  duration  in  time,  whereas  the  the  ideal  bandpass  wavelet  (2.10),  being  ban- 
dlimited,  has  infinite  duration  in  time. 


t  We  will  see  in  Sec.  10  that  the  function  0{t)  arises  naturally  in  the  context  of  the  fundamental  idea  of 
multiresolution. 
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2,5.  Basic  properties  of  waveiet  transforms 

Based  on  the  definitions  and  examples  provided  so  far  we  can  already  draw  some  very  interesting  conclusions 
about  wavelet  transforms,  and  obtain  a  preliminary  comparison  with  the  Fourier  transform. 


j  _ 3^  wavelet  .« - - s. 

Analysis  bank  coefficients  Synthesis  bank 


nCt)  is  the  analyzing 
wavelet 


V(t)  is  the  synthesizing 
wavelet 


(a) 


Fig.  2.13.  (a)  Representing  the  diadic  wavelet  transform  as  an  analysis 
bank  followed  by  samplers,  and  the  inverse  transform  as  a  synthesis  bank. 

For  the  otrhonotmal  case,  and  fk{t)  — 

(b)  Filter  responses  for  the  example  where  ip(t)  is  the  ideal  bandpass  wavelet. 

1.  Concept  of  scale.  The  functions  7pkn{t)  are  useful  to  represent  finer  and  finer  “variations”  in  the  signal 
x(t)  at  various  levels.  For  large  k,  the  function  xfknit)  looks  like  a  “high  frequency  signal.”  This  is 
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especiaJly  clear  from  the  plots  of  the  Haar  basis  functions.  (For  the  bandpass  wavelets,  see  below.)  Since 
these  basis  functions  are  not  sinusoids,  we  do  not  use  the  term  frequency  but  rather  the  term  “scale”. 
We  say  that  the  component  V’fcn(i)  represents  a  finer  scale  for  larger  k.  Accordingly  k  (sometimes  1/k) 
is  called  the  scale  variable.  Thus,  the  function  x{t)  has  been  represented  as  a  linear  combination  of 
component  functions  that  represent  variations  at  different  “scales”.  For  instance,  consider  the  Haar 
basis.  If  the  signal  expansion  (2.14)  has  a  relatively  large  vcJue  of  04^2  this  means  that  the  component 
at  scale  fc  =  4  has  large  energy  in  the  interval  [^,  (Fig.  2.13). 

2.  Localized  basis.  The  above  comment  shows  that  if  a  signal  has  energy  at  a  particular  scale  concentrated 
in  a  slot  in  the  time  domain,  then  the  corresponding  Cfc„  has  large  value,  i.e.,  ipkn{t)  contributes  more 
to  x{t).  The  wavelet  basis  therefore  provides  a  localization  information  in  time  domain  as  well  as  in 
the  scale  domain.  For  example,  if  the  signal  is  zero  everywhere  except  in  the  interval  [^,  then  the 
subset  of  the  Harr  basis  functions  which  do  not  have  their  support  in  this  interval  are  simply  absent  in 
this  expansion. 

Note  that  the  Haar  wavelet  has  compact  support,  that  is  the  function  ^(t)  is  zero  everywhere  outside  a  closed 
bounded  interval  (namely  [0,1]  here).  While  the  above  discussions  are  motivated  by  the  Haar  basis,  many  of 
them  are  typically  true,  with  some  obvious  modifications,  for  more  general  wavelets.  Consider  for  example 
the  ideal  bandpass  wavelet  (Fig.  2.11)  obtained  from  the  bandpass  filter  ’I'(w)  in  Fig.  2.9.  In  this  case 
the  basis  functions  do  not  have  compact  support  but  are  still  locally  concentrated  around  f  =  0.  Moreover, 
the  basis  functions  for  large  k  represent  “fine”  information,  namely  the  frequency  component  around  the 
center  frequency  of  the  filter  Tfc(w)  (Fig.  2.10).  The  Haar  wavelet  and  the  ideal  bandpass  wavelet  are  two 
extreme  examples  (one  is  time  limited  and  the  other  bandlimited).  We  will  find  later  that  many  intermediate 
examples  can  be  constructed. 

2.6.  Filter  Bank  Interpretation  and  Time- Frequency  Representation 

We  know  that  the  wavelet  coefficients  Ckn  for  the  ideal  bandpass  wavelet  can  be  viewed  as  the  sampled  version 
of  the  output  of  a  bandpass  filter  (Fig.  2.10(a)).  The  same  is  true  for  any  kind  of  wavelet  transform.  For  this 
recall  the  expression  (2.18)  for  the  wavelet  coefficients.  This  can  be  interpreted  as  the  set  of  sampled  output 
sequences  of  a  bank  of  filters  ^^(w)  with  impulse  response  hk{t)  =  2*/^ho(2*t)  where  =  r/(t).  Thus 

the  wavelet  transform  can  be  interpreted  as  a  nonuniform  continuous-time  analysis  filter  bank,  followed  by 
samplers.  The  Haar  basis  and  ideal  bandpass  wavelet  basis  are  two  examples  of  the  choice  of  these  bandpass 
filters! 

The  wavelet  coefficients  Ckn  for  a  given  scale  k  are  therefore  obtained  by  sampling  the  output  yk{t)  of 


the  bandpass  filter  Hk{u;),  as  indicated  in  Fig.  2.10(a).  The  first  subscript  k  (the  scale  variable)  represents 
the  filter  number.  As  k  increases  by  one,  the  center  frequency  w*  increases  by  a  factor  of  two.  The  wavelet 
coefficients  Ckn  at  scale  k  are  merely  the  samples  yk{2~'‘n).  As  k  increases  the  filter  bandwidth  increases, 
so  the  samples  are  spaced  by  a  proportionately  finer  amount  2“*.  The  quantity  Cfc„  =  yk{2  ^n)  measures 
the  “amount”  of  the  “frequency  component”  around  the  center  frequency  ujk  of  the  analysis  filter  iffc(a;), 
localized  in  time  around  2~^n. 

In  wavelet  transformation,  the  transform  domain  is  represented  by  the  two  integer  variables  k  and  n. 
This  means  that  the  transform  domain  is  two  dimensional  (the  time-frequency  domain),  and  is  discretized. 
We  say  that  Cfc„  is  a  time-frequency  representation  of  x{t).  We  will  see  in  Sec.  3  that  this  is  an  improvement 
over  another  time-frequency  representation  called  the  short  time  Fourier  transform  (STFT)  introduced  earlier 
in  the  sinal  processing  context. 

Synthesis  Filter  bank  and  reconstruction.  The  inner  sum  in  (2.14)  can  be  interpreted  as  follows; 
for  each  fc,  convert  the  sequence  Cfc„  into  an  impulse  traint  X^„Cfc„6a(t  -  2~^n)  and  pass  it  through  a 
bandpass  filter  Ffc(w)  =  with  impulse  response  fk{t)  =  2^/^i;{2h).  The  outer  sum  merely 

adds  the  outputs  of  all  these  filters.  Figs.  2.7(a)  and  2.10(a)  show  this  interpretation.  Therefore,  the 
reconstruction  of  the  signal  x{t)  from  the  wavelet  coefficients  Cfc„  is  equivalent  to  the  implementation  of 
a  nonuniform  continuous-time  synthesis  filter  bank,  with  synthesis  filters  fk{t)  —  2*'/^/o(2*t)  generated  by 
dilations  of  a  single  filter  fo{t)='tf{t). 

As  mentioned  in  Sec.  2.3,  the  analyzing  wavelet  r]{t)  and  the  synthesis  wavelet  ip{t)  are  related  by 
77(f)  =  in  tiie  orthonormal  case.  So  the  analysis  and  synthesis  filters  are  related  as  hk{t)  =  fk{-t), 

that  is,  Hk{oj)  —  For  the  special  case  of  the  ideal  bandpass  wavelet  (2.10),  V’(t)  is  real  and  symmetric 

so  that  fkit)  =  fki-t),  i-e-,  hkit)  =  fk{t).  Fig.  2.13  summarizes  the  relations  described  in  the  preceding 
paragraphs. 

Design  of  Wavelet  Functions 

Since  all  the  filters  in  the  analysis  and  synthesis  banks  are  derived  from  the  wavelet  function  rp{t),  the 
quality  of  the  frequency  responses  depend  directly  on  'F(w).  In  the  time  domain,  the  Haar  basis  has  poor 
smoothness  (it  is  not  even  continuous)  but  it  is  well-localized  (compactly  supported).  Its  Fourier  transform 
’F(a;),  given  in  (2.20)  decays  only  as  l/w  for  large  w.  The  ideal  bandpass  wavelet,  on  the  other  hand,  is 
poorly  localized  in  time  but  has  very  smooth  behavior.  In  fact  since  it  is  bandlimited,  ij{t)  is  infinitely 
differentiable.  But  it  decays  only  as  1/f  for  large  f.  Thus  the  Haar  wavelet  and  the  ideal  bandpass  wavelet 
t  6a{t)  is  the  Dirac  delta  function  [Oppenheim,  et.  al.,  1983].  Here  it  is  used  only  as  a  schematic.  The 
true  meaning  is  just  that  the  output  of  fk{t)  is  ^knfkit  —  2  ^n). 
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represent  two  opposite  extremes  of  the  possible  choices  we  have. 

We  could  carefully  design  the  wavelet  i>{t)  such  that  it  is  reasonably  well  localized  in  time  domain,  while 
at  the  same  time  sufficiently  smooth  or  “regular”.  The  term  regularity  is  often  used  to  quantify  the  degree 
of  smoothness.  For  example  the  number  of  times  we  can  differentiate  the  wavelet  and  the  degree  of 

continuity  (so-called  Holder  index)  of  the  last  derivative  are  taken  as  measures  of  regularity.  We  will  return 
to  this  in  Sec.  11-13  where  we  also  present  systematic  procedures  for  design  of  the  function  This 

can  be  designed  in  such  a  way  that  -  w)}  forms  an  orthonormal  basis  with  prescribed  decay  and 

regularity  properties.  It  is  also  possible  to  design  V^(f)  such  that  we  obtain  other  kinds  of  structures  rather 
than  an  orthonormaJ  basis,  e.g.,  a  Riesz  basis  or  a  frame  (Sec.  7,8). 

2.7.  Wavelet  Basis  and  Fourier  Basis 

Returning  to  the  Fourier  basis  gk{t)  =  for  functions  supported  on  [0, 1],  we  see  that  gk{t)  =  gi{kt) 

so  that  all  the  functions  are  dilated  versions  (dilations  being  integers  rather  than  powers  of  integers)  of 
gi{t)\  However  these  do  not  have  the  localization  property  of  wavelets.  To  understand  this,  note  that 
has  unit  magnitude  everywhere,  and  sines  and  cosines  are  nonzero  almost  everywhere.  Thus  if  we  have  a 
function  x{t)  that  is  identically  zero  in  a  certain  time  slot  (e.g.,  Fig.  2.14),  then  in  order  for  the  infinite 
series  to  represent  x(t),  there  heis  to  be  extreme  cancellation  of  terms  in  that  time  slot. 


Fig.  2.14.  Example  of  an  T^[0,1]  signal  x{t)  for  which  the  Haar  component  ififift)  dominates. 

In  contrast,  if  we  use  a  compactly  supported  wavelet  basis,  it  provides  us  localization  information,  as  well 
as  information  about  “frequency  contents”  in  the  form  of  “scales”  (Sec.  2.5).  The  “transform  domain” 


in  traditional  Fourier  transform  is  represented  by  a  single  continuous  variable  u).  In  the  wavelet  transform, 
where  the  transform  coefficients  are  Cfc„,  the  transform  domain  is  represented  by  two  integers  k  and  n. 

It  is  also  clear  that  wavelet  transforms  provide  a  great  deal  of  flexibility  because  we  can  choose  ip{t). 
With  Fourier  transforms  on  the  other  hand,  the  basis  functions  (sines  and  cosines)  are  pretty  much  fixed 
(see,  however,  Sec.  3  on  short-time  Fourier  transform). 

2.8.  More  Genera!  Form  of  Wavelet  Transformation 

The  most  general  form  of  the  wavelet  transform  is  given  by 

where  a  and  b  are  real.  This  is  called  the  continuous  wavelet  transform  (CWT)  because  a  and  b  are  continuous 
variables.  The  transform  domain  is  a  two  dimensional  domain  (a,  6).  The  restricted  version  of  this  where  a 
and  b  take  a  discrete  set  of  values  a  =  c"'''  and  b  =  c-'^n  where  k  and  n  vary  over  the  set  of  all  integers,  is 
called  the  Discrete  Wavelet  Transform  (DWT).  The  further  special  case  where  c  =  2,  that  is,  a  =  2'^  and 
b  =  2~^n  is  the  wavelet  transform  discussed  so  far  (see  eq.  (2.16))  and  is  called  the  dyadic  DWT.  Expansions 
of  the  form  (2.14)  are  also  called  wavelet  series  expansions  by  analogy  with  the  Fourier  series  expansion  (a 
summation  rather  than  an  integral). 

For  fixed  a,  Eq.  (2.21)  is  a  convolution.  Thus,  if  we  apply  the  input  signal  x{t)  to  a  filter  with  impulse 
response  ipi-t ! a) / ^J\a\,  then  its  output,  evaluated  at  time  6,  will  be  X(a,  b).  The  filter  has  frequency  response 
/H^(-aa;).  If  we  imagine  that  ^(w)  has  a  good  bandpass  response  with  center  frequency  cao,  then  the 
above  filter  is  bandpass  with  center  frequency  -o“^wo.  That  is,  the  wavelet  transform  X{a,b),  which  is 
the  output  of  the  filter  at  time  6,  represents  the  “frequency  content”  of  x(t)  around  the  frequency  -a 
“around”  time  b.  Ignoring  the  minus  sign  (because  i/;(t)  and  x(t)  are  typically  real  anyway),  we  therefore  see 
that  the  variable  a~^  is  analogous  to  frequency.  In  wavelet  literature,  the  quantity  |a|  is  usually  referred  to 
as  the  “  scale”  rather  than  “inverse  frequency”. 

For  reasons  which  cannot  be  explained  with  our  limited  exposure  so  far,  the  wavelet  function  ri(t)  is 
restricted  to  be  such  that  f  ri(t)dt  =  0.  For  the  moment  notice  that  this  is  equivalent  to  v[-(0)  =  0,  which  is 
consistent  with  the  bandpass  property  of  V'(t)-  In  Sec.  10.4  where  we  generate  wavelets  systematically  using 
multiresolution  analysis,  we  will  see  that  this  condition  follows  naturally  from  theoretical  considerations. 

3.  THE  SHORT  TIME  FOURIER  TRANSFORM  (STFT) 

In  many  applications,  we  have  to  accomodate  the  notion  of  frequency  that  evolves  or  changes  with  time. 
For  example,  audio  signals  are  often  regarded  as  signals  with  a  time  varying  spectrum,  e.g.,  a  sequence  of 
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short  lived  pitch  frequencies.  This  idea  cannot  be  expressed  with  the  traditional  FT  since  X{ui)  for  each  lo 
depends  on  x{t)  for  all  t. 

The  short  time  Fourier  transform  (STFT)  was  introduced  to  provide  such  a  time-frequency  picture  of 
the  signal  [Gabor,  1946],  [Flanagan  and  Golden,  1966],  [Schafer  and  Rabiner,  1973],  [Allen  and  Rabiner, 
1977],  and  [Portnoff,  1980].  Here  the  signal  x{t)  is  multiplied  with  a  window  v{t  -  r)  centered  or  localized 
around  time  r  (Fig.  3.1)  and  the  FT  of  x(t)v(t-T)  computed; 


Fig.  3.1.  A  signal  x(t),  and  the  sliding  window  v(t  —  r). 


This  is  then  repeated  for  shifted  locations  of  the  window,  i.e.,  for  various  values  of  r.  That  is,  we  compute 
not  just  one  FT,  but  infinitely  many.  The  result  is  a  function  of  both  time  r  and  frequency  w.  If  this  has 
to  be  practical  we  have  to  make  two  changes:  compute  the  STFT  only  for  discrete  values  of  ur,  and  second 
use  only  a  discrete  number  of  window  positions  t.  In  the  traditional  STFT  both  w  and  t  are  discretized  on 
uniform  grids: 

w  =  ftws,  T  =  nTg.  (3-2) 

The  STFT  is  thus  defined  as 

/oo 

x(t)v(t-nTs)e-^^‘^‘*dt,  (3.3) 

'OO 

which  we  abbreviate  as  Xst/tik^n)  when  there  is  no  confusion.  Thus  the  time  domain  is  mapped  into  the 
time-frequency  domain.  The  quantity  Xstftiku;s,nTs)  represents  the  FT  of  x{t)  “around  time  nTg  and 
^‘around  frequency  This  in  essence  is  similar  to  the  wavelet  transform,  in  both  cases  the  transform 

domain  is  a  two  dimensional  discrete  domain. 
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We  will  compare  wavelets  and  STFT  on  several  grounds.  We  will  give  a  filter-bank  view,  and  compare 
time-frequency  resolution  and  localization  properties.  In  Sec.  9  we  will  compare  them  on  deeper  grounds. 
For  example,  when  can  we  reconstruct  a  signal  x{t)  from  the  STFT  coefficients  Xstft{k,  n)l  Can  we  construct 
an  orthonormal  basis  for  signals  based  on  the  STFT?  And  so  forth.  The  advantage  of  wavelet  transforms 
over  the  STFT  will  be  clear  after  these  discussions. 

3.1.  Filter  Bank  Interpretation 

The  STFT  evaluated  for  some  frequency  uJk  can  be  rewritten  as 

/CO 

x{t)v{t-T)e-^‘^’‘^^-^Ut.  (3.4) 

-CO 

The  integral  looks  like  a  convolution  of  x{t)  with  the  filter  impulse  response 

hk{t)=v{-t)e^‘^‘‘\  (3-5) 

If  v{—t)  has  a  Fourier  transform  looking  like  a  lowpass  filter  then  hfc(t)  looks  like  a  bandpass  filter  with  center 
frequency  Wfc  (Fig.  3.2).  Thus,  Xstft{oJk,T)  is  the  output  of  this  bandpass  filter  at  time  r,  downshifted  in 

frequency  by  ujk- 


Fig.  3.2.  The  STFT  viewed  as  a  bandpass  filter  followed  by  a  downshifter. 
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The  result  is  a  lowpass  signal  yk{t)  whose  output  is  sampled  uniformly  at  time  r  =  nTs.  For  every  frequency 
Wk  so  analyzed,  there  is  one  such  filter  channel.  With  the  frequencies  uniformly  located  at  uik  =  kojg,  we  get 
the  analysis  filter  bank  followed  by  downshifters  and  samplers  as  shown  in  Fig.  3.3. 


g-  j®st 


-a""  ^ 

t 

STFT  coefficients 


Fig.  3.3.  The  STFT  viewed  as  an  analysis  bank  of  uniformly  shifted  filters. 


The  STFT  coefficients  Xstft{kuJs,nTs)  can  therefore  be  regarded  as  the  uniformly  spaced  samples  of 
the  outputs  of  a  bank  of  bandpass  filters  Hk{oj),  all  derived  from  one  filter  hk{t)  by  modulation:  hk{t)  = 
i.e.,  Hk{oj)  =  -  kus).  (The  filters  are  one-sided  in  frequency  so  they  have  complex  coef¬ 

ficients  in  the  time  domain,  but  ignore  these  details  for  now).  The  output  of  Hk{<^)  represents  a  portion 
of  the  FT  X{uj)  around  the  frequency  The  downshifted  version  yk{t)  is  therefore  a  lowpass  signal. 

That  is,  it  is  a  slowly  varying  signed,  whose  evolution  as  a  function  of  t  represents  the  evolution  of  the  FT 
X{ui)  around  frequency  kuig.  By  sampling  this  slowly  varying  signal  we  can  therefore  compress  the  transform 
domain  information. 

If  the  window  is  narrow  in  the  time  domain,  then  Hki^jj)  has  large  bandwidth.  That  is,  we  have  good 
time  resolution  and  poor  frequency  resolution.  If  the  window  is  wide  the  opposite  is  true.  Thus  if  we  try  to 
capture  the  local  information  in  time  by  making  a  narrow  window,  then  we  get  a  fuzzy  picture  in  frequency. 
Conversely  in  the  limit  as  the  filter  becomes  extremely  localized  in  frequency  the  window  is  very  broad,  and 
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STFT  approaches  the  ordinary  FT.  That  is,  the  time-frequency  information  collapses  to  the  all-frequency 
information  of  ordinary  FT.  We  see  that  time-frequency  representation  is  inherently  a  compromise  between 
time  and  frequency  resolutions  (or  localizations).  This  is  related  to  the  uncertainty  principle,  as  windows 
get  narrow  in  time  they  have  to  get  broad  in  frequency,  and  vice  versa. 

Optimal  time-frequency  resolution:  the  Gabor  window 

What  is  the  best  frequency  resolution  one  can  obtain  for  a  given  time  resolution?  That  is,  for  a  given 
duration  of  the  window  v(t)  how  small  can  the  duration  of  V{lj)  be?  If  we  define  duration  according  to 
common  sense  we  are  already  in  trouble  because  if  v{t)  has  finite  duration  then  T(a;)  has  infinite  duration. 
There  is  a  more  useful  definition  of  duration  called  the  root  mean  square  (rms)  duration.  The  rms  time 
duration  Dt  and  the  rms  frequency  duration  D/  for  the  window  v{t)  are  defined  such  that 


_  JtMmt  ,  ^  /u;^|T(a;)Pdu; 

f  !\V{u;Wdu, 


(3.6) 


Intuitively  we  can  see  that  Dt  cannot  be  arbitrarily  small  for  a  specified  Df.  The  uncertainty  principle  says 
that  DtDf  >  0.5.  Equality  holds  if  and  only  if  v{t)  has  the  shape  of  a  Gaussian,  i.e.,  v{t)  =  >  0. 

Thus  the  best  joint  time-frequency  resolution  is  obtained  by  using  the  Gaussian  window.  This  is  also 
intuitively  acceptable  for  the  reason  that  the  Gaussian  is  its  own  FT  (except  for  scaling  of  variables  and 
so  forth).  Gabor  used  the  Gaussian  window  as  early  as  1946!  Since  it  is  of  infinite  duration,  a  truncated 
approximation  is  used  in  practice.  The  STFT  bcised  on  the  Gaussian  is  called  the  Gabor  transform.  A 
limitation  of  the  Gabor  transform  is  that  it  does  not  give  rise  to  an  orthonormal  signal  representation;  in 
fact  it  cannot  even  provide  a  “stable  basis”  (in  Sec.  7  and  9  we  explain  the  meaning  of  this). 

3.2.  Wavelet  Transform  Versus  STFT 

The  STFT  works  with  a  fixed  window  v{t).  If  a  high  frequency  signal  is  being  analyzed,  many  cycles  are 
captured  by  the  window,  and  a  good  estimate  of  the  FT  is  obtained.  But  if  a  signal  varies  very  slowly  with 
respect  to  the  window  then  the  window  is  not  long  enough  to  capture  it  fully.  From  a  filter  bank  viewpoint, 
notice  that  all  the  filters  have  identical  band  widths  (Fig.  3.3).  This  means  that  the  frequency  resolution  is 
uniform  at  all  frequencies.  That  is,  the  “percentage  resolution”  or  accuracy  is  poor  for  low  frequencies  and 
becomes  better  and  better  at  high  frequencies.  The  STFT  therefore  does  not  provide  uniform  percentage 
accuracy  for  all  frequencies  —  the  computational  resources  are  somehow  poorly  distributed. 

Compare  this  with  the  wavelet  transform  which  is  represented  by  a  nonuniform  filter  bank  Fig.  2.8(b). 
Here  the  frequency  resolution  gets  poorer  as  the  frequency  increases  but  the  fractional  resolution  (i.e.,  the 
filter  bandwidth  Aw*  divided  by  the  center  frequency  uJk)  is  constant  for  all  k.  That  is,  the  percentage 
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accuracy  is  uniformly  distributed  in  frequency.  In  the  time  domain  this  is  roughly  analogous  to  having  a 
large  library  of  windows;  the  narrow  window  is  used  to  analyze  high  frequency  components  and  very  broad 
windows  are  used  to  analyze  low  frequency  components.  In  electrical  engineering  language  the  filter  bank 
representing  wavelet  transforms  is  a  constant  Q  filter  bank,  or  an  octave  band  filter  bank. 

Consider,  for  example,  the  Haar  wavelet  basis.  Here  the  narrow  basis  functions  V’2,n(0  in  Fig.  2.12 
are  useful  to  represent  the  highly  varying  components  of  the  input,  and  are  correspondignly  narrower  (have 
shorter  support  than  the  functions  V'i.n(t)- 


Fig.  3.4.  Time-frequency  tiling  schemes  for  (a)  STFT  and  (b)  the  wavelet  transform. 

A  second  difference  between  the  STFT  and  wavelet  transforms  is  the  sampling  rates  at  the  outputs  of 
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the  bandpass  filters.  These  are  identical  for  the  STFT  filters  (all  filters  have  the  same  bandwidth).  For  the 
wavelet  filters,  these  are  proportional  to  the  filter  bandwidths,  hence  nonuniform  (Fig.  2.10(a)).  This  is 
roughly  analogous  to  the  situation  that  the  narrower  windows  move  in  smaller  steps  compared  to  the  wider 
windows.  Compare  again  with  Fig.  2.12  where  V’2,n(0  are  moved  in  smaller  steps  compared  to  in 

the  process  of  constructing  the  complete  set  of  basis  functions. 

The  nonuniform  (constant  Q)  filter  stacking  (Fig.  2.8(b))  provided  by  wavelet  filters  is  also  naturally 
suited  for  analyzing  audio  signals  and  sometimes  even  as  components  in  the  modeling  of  the  human  hearing 
system. 

The  Time-Frequency  Tiling 

The  fact  that  the  STFT  performs  uniform  sampling  of  time  and  frequency  whereas  the  wavelet  transform 
performs  non  uniform  sampling  is  represented  by  the  diagram  shown  in  Fig.  3.4.  Here  the  vertical  lines 
represent  time  locations  where  the  aiicdysis  filter-bank  output  is  sampled  and  the  horizontal  lines  represent 
the  center  frequencies  of  the  bandpass  filters.  The  time  frequency  tiling  for  the  STFT  is  a  simple  rectangular 
grid,  whereas  for  the  wavelet  transform  it  has  a  more  complicated  appearance. 

Example  3.1.  Wavelet  Transform  Versus  STFT 

Consider  the  signal  x{t)  =  cos(107rf)  +  0.5 cos(5xt)  -f  1.26<,(t  -  0.07)  -b  1.26ait  +  0.07).  It  has  impulses  at 
t  =  ±0.07,  in  the  time  domain.  There  are  two  impulses  (or  “lines” )  in  the  frequency  domain,  at  =  Stt  and 
ui2  =  IOtt.  The  function  is  shown  in  Fig.  3.5  (with  impulses  replaced  by  narrow  pulses).  The  aim  is  to  try  to 
compute  the  STFT  or  WT  such  that  the  impulses  in  time  as  well  as  those  in  frequency  are  resolved.  Figure 
3.6  (a)-(c)  shows  the  STFT  plot  for  three  widths  of  the  window  v{t)  and  Fig.  3.6(d)  shows  the  wavelet  plot. 
The  details  of  the  window  v{t)  and  the  wavelet  ip{t)  used  for  this  example  will  be  described  below,  but  first 
let  us  concentrate  on  the  features  of  these  plots. 


Fig.  3.5.  Example  3.1.  The  signal  to  be  analyzed  by  STFT  and  Wavelet  transform. 

The  STFT  plots  are  time-frequency  plots,  whereas  the  wavelet  plots  are  (a”^,  6)  plots  where  a  and  b  are 


26 


defined  by  (2.21).  As  explained  in  Sec.  2.8,  the  quantity  a  ^  is  analogous  to  “frequency”  in  the  STFT,  and  b 
is  analogous  to  “time”  in  the  STFT.  The  brightness  of  the  plots  in  Fig.  3.6  is  proportional  to  the  magnitude 
of  the  STFT  or  WT,  so  the  transform  is  close  to  zero  in  the  dark  regions.  We  see  that  for  a  narrow  window 
with  width  =  0.1,  the  STFT  resolves  the  two  impulses  in  time  reasonably  well,  but  the  impulses  in  frequency 
are  not  resolved.  For  a  wide  window  with  width  =  1.0,  the  STFT  resolves  the  “lines”  in  frequency  very  well, 
but  not  the  time  domain  impulses.  For  an  intermediate  window  width  =  0.3,  the  resolution  is  poor  in  both 
time  and  frequency.  The  wavelet  transform  plot  (Fig.  3.6  (d)),  on  the  other  hand,  simultaneously  resolves 
both  time  and  frequency  very  well.  We  can  clearly  see  the  locations  of  the  two  impulses  in  time,  as  well  as 
the  two  lines  in  frequency. 

Now  for  the  details.  The  STFT  for  this  example  was  computed  using  the  Hamming  window  [Oppenheim 
and  Schafer,  1989]  defined  as  v{t)  =  c[0.54+0.46cos(;rt/i5)]  for  -D  <  t  <  and  zero  outside.  The  “widths” 
indicated  in  the  figure  correspond  to  D  —  0.1, 1.0,  and  0.3  (though  the  two-sided  width  is  twice  this).  The 
wavelet  transform  was  computed  by  using  an  example  of  the  Morlet  wavelet  [Daubechies,  1992].  Specificall}/, 

^(t)  =  -  a). 


Fig.  3.7.  Example  3.1.  Fourier  transform  magnitude  for  the  Morlet  wavelet. 

First  let  us  understand  what  this  wavelet  function  is  doing.  The  quantity  e~‘  is  the  Gaussain  (except 
for  a  constant  scaler  factor)  with  Fourier  transform  which  is  again  Gaussian,  concentrated  near 

w  =  0.  Thus  hcis  an  FT  concentrated  around  w  =  tt.  Ignoringt  the  second  term  a  in  the  expression 

for  ip{t),  we  see  that  the  wavelet  is  a  narrowband  bandpass  filter  concentrated  around  tt  (Fig.  3.7).  If  we  set 
o  =  1  in  (2.21),  then  A'(l,  6)  represents  the  frequency  contents  around  tt.  Thus,  the  frequencies  uji  =  and 
UJ2  =  IOtt  in  the  given  signal  x{t)  show  up  around  points  =  5  and  a~^  =  10  in  the  wavelet  transform 
plot,  as  seen  from  Fig.  3.6(d).  In  the  STFT  plots,  we  have  shown  the  frequency  axis  as  uj/tt  so  that  the 

t  The  quantity  a  in  the  expression  of  V'(t)  is  there  to  ensure  that  J  =  0  (Sec.  2.8).  Since  a  is  very 

small,  it  does  not  significantly  affect  the  plots  in  Fig.  3.6. 
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frequencies  wj  and  0^2  show  up  at  5  and  10,  making  it  easy  to  compare  the  STFT  plots  with  the  wavelet 
plot. 

Mathematical  Issues  to  be  Addressed 

While  the  filter  bank  view  point  places  wavelets  and  STFT  on  a  unified  ground,  several  mathematical 
issues  still  remain  to  be  addressed.  It  is  this  deeper  study  that  brings  forth  further  subtle  differences,  giving 
wavelets  a  definite  advantage  over  the  STFT. 

Returning  to  the  STFT,  let  us  say  that  we  start  from  a  signal  x{t)  €  L"^  and  compute  the  STFT 
coefficients  X{kuJs,nTs)-  How  should  we  choose  the  sampling  periods  Tg  and  uig  of  the  time  and  frequency 
grids  so  that  we  can  reconstruct  x{t)  from  the  STFT  coefficients?  (Remember  that  we  are  not  talking 
about  bandlimited  signals,  and  there  is  no  sampling  theorem  at  work).  If  the  filters  Hk{uj)  are  ideal  one¬ 
sided  bandpass  filters  with  bandwidth  uig,  the  downshifted  lowpass  outputs  yk{t)  (Fig.  3.2)  can  be  sampled 
separately  at  the  Nyquist  rate  uig  or  higher.  This  then  teUs  us  that  Tg  <  27r/uis,  that  is 

cjgTg  <  27r.  (3.7) 

However  the  use  of  ideal  filters  implies  an  impractical  window  v{n). 

If  we  use  a  practical  window  (e.g.,  one  of  finite  duration)  then  how  should  we  choose  Tg  in  relation  to 
ujg  so  that  we  can  reconstruct  x{t)  from  the  STFT  coefficients  X{kojg,nTg)'!  Is  this  a  stable  reconstruction, 
that  is  if  we  make  a  small  error  in  some  STFT  coefficient  does  it  affect  the  reconstructed  signal  in  an 
unbounded  manner?  Finally,  does  the  STFT  provide  an  orthonormal  basis  for  These  questions  are  deep 
and  interesting,  and  require  more  careful  treatment.  We  will  do  this  in  Sec.  9. 

4.  DIGITAL  FILTER  BANKS  AND  SUBBAND  CODERS 

We  will  now  discuss  a  totally  different  set  up,  namely  a  discrete-time  filter  bank  or  a  digital  filter  bank.  This 
has  some  qualitative  resemblance  to  the  continuous  time  filter  banks  which  were  used  to  represent  the  STFT 
and  wavelet  transform  earlier.  An  example  of  a  digital  filter  bank  is  shown  in  Fig.  4.1(a)  where  x{n)  is  a 
discrete  time  signal  (sequence).  Here  Ga(s)  aud  Ha{z)  are  two  digital  filters,  typically  lowpass  and  highpass 
which  split  x{n)  into  two  subbands.  The  subband  signals  xo{n)  and  xi{n)  are  downsampled  or  decimated 
(see  below  for  definitions).  The  total  subband  data  rate  counting  both  subbands  is  then  equal  to  the  number 
of  samples  per  unit  time  in  the  original  signal  x{n).  There  are  extensions  of  this  system  for  more  than  two 
subbands,  called  M  channel  maximally  decimated  filter  banks. 

There  are  several  reasons  for  discussing  digital  filter  banks  in  this  chapter.  First,  they  provide  a  time- 
frequency  representation  for  discrete  time  signals,  similar  to  the  STFT  and  wavelet  transforms  for  continuous 
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time  signals.  Second,  there  is  a  deep  mathematical  connection  between  this  digital  filter  bank  and  the 
continuous  time  wavelet  transform.  This  fundamental  relation,  discovered  by  Daubechies  [1988],  is  fully 
elaborated  later  in  Sec.  10-13,  and  is  what  makes  the  wavelet  transform  so  easy  to  design,  and  attractive  to 
implement  in  practice.  This  relation  also  opens  up  a  little  world  of  beautiful  research  problems  for  engineers 
as  well  as  mathematicians.  Indeed,  the  recent  boom  of  interest  in  wavelet  transforms  can  be  traced  back  to 


the  discovery  of  this  relation. 


(a) 


Analysis  Decimators  Expanders  Synthesis 

filters  filters 


Fig.  4.1.  (a)  The  two  channel  digital  filter  bank,  (b)  Typical  filter  responses,  and 

(c)  Typical  input  spectrum. 


A  few  words  on  subband  coding. 

The  most  common  application  of  the  digital  filter  bank  is  in  subband  coding.  The  basic  idea  can  be 
explained  using  the  following  simple  example:  suppose  x{n)  has  its  energy  mostly  in  the  lowpass  region  (e.g., 
as  in  speech  or  music).  Then  we  can  use  lowpass  and  highpass  filters  Gaiz)  and  Hc,{z)  to  split  x{n)  into 
subbands.  This  is  demonstrated  in  Fig.  4.1(b),(c).  We  then  assign  more  bits  to  the  lower  subband  xo{n)  as 
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compared  to  the  higher  subband  Xi  (n).  For  a  fixed  bit  rate,  this  allows  a  judicious  utilization  of  bits.  This  idea 
is  used  in  practice  in  a  more  elaborate  manner;  for  example  x(n)  is  split  into  more  than  two  bands,  often  into 
unequal  bandwidths  (using  the  so-called  nonuniform  filter  banks).  Then  an  optimal  subband  bit-allocation 
strategy  is  adopted.  This  allocation  takes  into  account  not  only  the  energy  distribution  in  the  subbands, 
but  also  perceptive  considerations  such  as  the  masking  property  of  the  ear  as  a  function  of  frequency.  In 
this  way  very  significant  compression  rate  can  be  achieved.  For  example,  digital  audio  (CD  music)  has  been 
compressed  by  more  than  a  factor  of  four  [Veldhuis,  et  al.,  1989].  In  image  compression  applications  [Woods, 
1991]  subband  coding  is  often  used  in  combination  with  other  nonlinear  filtering  operations.  Subband  coding 
has  a  long  history  and  a  rich  list  of  applications,  but  we  will  not  go  into  these  here. 

Detailed  studies  of  filter  banks  and  multirate  systems  can  be  found  in  a  number  of  references,  see  for 
example  [Vaidyanathan,  1993]  and  references  therein.  Our  treatment  here  will  be  brief,  the  aim  being  to  lay 
the  foundation  that  will  enable  us  to  explain  the  connection  to  wavelet  transforms. 

4.1.  The  Multirate  Signal  Processing  Building  Blocks 

The  building  blocks  in  the  digital  filter  bank  of  Fig.  4.1(a)  are  digital  filters,  decimators,  and  expanders. 
The  M-fold  decimator  or  downsampler  (denoted  i  M)  is  defined  by  the  input-output  relation  y{n)  =  x{Mn). 
For  example  the  two-fold  decimator  retains  even  numbered  samples,  and  drops  the  odd  ones.  For  M  =  2 
the  input  output  relation  in  the  2-domain  becomes 

Y{z)  =  0.o[X{z^^^)  +  Xi-z^^%  thatis,  y(e^'“)  =  0.5[X(e>“/2)  +  x(e^-(-2-)/2)].  (41^ 

whereas  for  arbitrary  M  it  is  Yiz)  =  (1/M)  i:f=o'  X{z^^‘^ This  relation  is  sometimes  abbreviated 

by  the  notation  Y{z)  =  X(2)|^^  or  y(e^“)  =  X(e-’‘^)|^^. 

There  are  two  terms  in  the  expression  (4.1)  for  Y{e^'^).  The  first  term  X(e-^“/^)  represents  a  stretched 
version  of  X{e^^).  The  second  term  is  a  shifted  version  of  this  stretched  version,  the  shift 

being  by  an  amount  2;r.  In  general  the  shifted  version  may  overlap  with  the  stretched  version.  In  this  case 
we  cannot  recover  the  input  signal  from  the  decimated  version.  This  is  exactly  the  effect  of  aliasing  created 
by  undersampling.  If  the  original  spectrum  X(e^“)  is  bandlimited  to  [-7r/2, 7r/2],  2-fold  decimation  does  not 
cause  aliasing. 

The  M  fold  expander  or  upsampler  (denoted  f  M)  is  defined  by 

,  ,  f  x{nlM),  n  -  multiple  of  M,  /a  o') 

=  otherwise. 

Thus  the  expander  simply  inserts  M  -  1  zero- valued  samples  between  adjacent  samples  of  the  input.  In  the 
transform  domain  the  relation  is  Y{z)  =  X{z^),  that  is,  y(e^“)  =  X(e^'^-).  Thus  the  expander  squeezes  the 
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Fourier  transform  by  ibf  so  that  the  period  becomes  27r/Af  rather  than  27r.  The  basic  shape  of  the  spectrum 
is  preserved,  consistent  with  the  fact  that  in  the  time  domain  there  is  no  loss  of  information;  we  can  always 
recover  the  input  to  the  expander  from  its  output,  just  by  decimating  it. 

4.2.  Reconstruction  from  Subbands 

In  many  applications  it  is  desirable  to  reconstruct  x(n)  from  the  decimated  subband  signals  j/fc(n)  (possibly 
after  quantization).  For  this,  we  pass  yk(n)  through  expanders  and  combine  them  with  synthesis  filters  Gs(z) 
and  JIs(z)  as  shown  in  Fig.  4.1(a).  The  system  is  said  to  have  the  perfect  reconstruction  (PR)  property  if 
x{n)  =  cx{n  —  no)  for  some  c  0  and  no.  In  general  the  PR  property  is  not  satisfied  for  several  reasons. 
First,  there  is  subband  quantization  and  bit  allocation,  which  is  the  key  to  data  compression  using  subband 
techniques.  But  since  our  interest  here  is  in  the  connection  between  filter  banks  and  wavelets,  we  will  not 
be  concerned  with  subband  quantization  here. 

Second,  since  the  filters  Ga{z)  and  Ha{z)  are  not  ideal  there  is  aliasing  due  to  decimation.  Using  the 
above  equations  for  the  decimator  and  expander  building  blocks,  we  can  obtain  the  following  expression  for 
the  reconstructed  signal 

X{z)  =0.^[Ga{z)Gs{z)  +  Ha{z)Hs{z)]X{z) 

(4.3) 

+  0.5  [Ga(—r)Gs(z)  +  Ha{—z)Hs{z)\X{—z). 

The  second  term  having  Xi-z)  arises  from  the  term  X(-2^G)  created  by  the  decimator  (see  (4.1))  and  is 
therefore  the  aliasing  term.  This  can  be  eliminated  by  designing  the  filters  such  that 

Ga{-z)Gsiz)  +  Ha{-z)Hs{z)  =  0,  (alias  cancellation).  (4.4) 

Assume  that  this  has  been  satisfied.  By  setting 

Gaiz)Gs(z)  +  Ha{z)Hsiz)  =  1  (4-5) 

we  can  then  obtain  A'(2)  =  0.5A'(2),  implying  perfect  reconstruction. 

There  are  many  ways  to  satisfy  the  perfect  reconstruction  conditions  [Smith  and  Barnwell,  1984],  [Vai- 
dynathan,  1987],  [Vetterli,  1987].  For  this  chapter  we  are  interested  in  a  particular  technique  to  satisfy  (4.4) 
and  (4.5).  This  has  been  called  the  conjugate  quadrature  filter  (CQF)  method,  and  was  independently  re¬ 
ported  in  [Smith  and  Barnwell,  1984]  and  [Mintzer,  1985].  It  was  shown  later  [Vaidynathan,  1987]  that  these 
constructions  are  examples  of  a  general  class  of  M  channel  filter  banks  called  paraunitary  filter  banks.  The 
two  channel  CQF  solution  was  later  rediscovered  in  the  totally  different  contexts  of  multiresolution  analysis 
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[Mallat,  1989]  and  compactly  supported  orthonormal  wavelet  construction  [Daubechies,  1988).  These  will  be 
discussed  in  future  sections. 

The  CQF  solution.  Suppose  the  analysis  filter  Ga{z)  is  chosen  such  that  it  satisfies  the  condition 

G,{z)Ga{z)  +  Ga{-z)Ga{-z)  =  1  for  all  2.  (4.6) 

If  we  now  choose  the  analysis  filter  Ha{z)  and  the  two  synthesis  filters  as 

Ha{z)  =  z-^Gai-z),  Gs{z)  =  Ga{z),  H^iz)  =  Haiz),  (4.7) 

then  substitution  shows  that  (4.4)  and  (4.5)  are  satisfied  indeed.  There  is  perfect  reconstruction,  that  is 
x{n)  =  0.5a;(n).  In  the  time  domain  the  above  equations  can  be  written  as 

h„(n)  = -(-l)"5:(-n+l),  9s{n)  =  g;{-n),  K{n)  =  hH-n).  (4.8) 

The  synthesis  filters  are  time  reversed  conjugates  of  the  analysis  filters.  If  we  design  a  filter  Ga{z)  satisfying 
the  single  condition  (4.6)  and  determine  the  remaining  three  filters  as  above,  then  the  system  has  the  PR 
property!  A  filter  Ga{z)  satisfying  (4.6)  is  sfod  to  be  power-symmetric.  Readers  familiar  with  half-band 
filters  will  notice  that  the  condition  (4.6)  says  simply  that  Gaiz)Ga{z)  is  half-band!  We  will  return  to  this 
in  Sec.  4.4. 

Design  procedure.  The  procedure  to  design  a  perfect  reconstruction  CQF  system  is  very  simple.  We 
first  design  a  zero-phase  lowpass  half-band  filter  G(-)  with  G(e^“)  >  0  and  then  extract  a  spectral  factor 
Ga{z),  that  is  find  Ga{z)  such  that  G{z)  =  Ga{z)Ga{z).  Once  the  lowpass  filter  Ga{z)  is  found  like  this,  the 
three  remaining  filters  can  be  found  from  (4.7). 

4.3.  The  Polyphase  Representation 

Polyphase  representations  for  transfer  functions  were  introduced  first  by  Bellanger  et  al.  [1976].  The 
polyphase  representation  of  a  filter  bank  gives  a  convenient  platform  for  studying  theoretical  questions 
and  also  helps  in  the  design  and  implementation  of  PR  filter  banks.  Let  H{z)  =  h{n)z  "  be  any  trans¬ 
fer  function.  We  can  express  it  in  the  form  H{z)  =  Eo{z^)  4-  z~^Ei{z^)  by  splitting  the  2  transform  into 
even  powers  of  2  and  odd  powers  of  2.  This  is  Ccilled  a  polyphase  decomposition  and  Ek{z)  are  called  the 
polyphase  components. 
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(a) 


(b) 


Polyphase  matrices 


Fig.  4.2.  (a)  The  polyphase  form  of  the  filter  bank,  (b)  further  simplification,  and 
(c)  equivalent  structure  when  R(z)  =  £"^(2). 


Similarly  we  can  also  write  H{:)  =  where  Ro{z)  —  Eo{z)  and  Ri{z)  =  2  ^£1(2).  We  will 

express  the  analysis  filters  Ga{z)  and  Ha{z)  in  the  polyphase  form 

Ga{z) 

Ha{z)  _____ 

E(^ 


Eooiz^)  Eoi{z^) 

1 

E,o{z^)  Fn(22) 

2-\ 

(4.9) 


and  the  synthesis  filters  in  the  polyphase  form 


[G,(2)  ff.(2)]  =  [l 


Rooiz^)  Roi{^^) 

R,o{z^)  f?xi(22) 


(4.10) 
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The  2x2  matrices  E(s)  and  R(£)  are  called,  respectively,  the  polyphaise  matrices  of  the  analysis  and  synthesis 
banks.  Fig.  4.2(a)  shows  a  redrawing  of  the  complete  filter  bank  using  the  polyphase  representation.  It  can 
be  shown  that  the  decimator  and  expander  can  be  moved  past  the  even  powers  of  2  to  obtain  the  simplified 
representation  shown  in  Fig.  4.2(b). 

If  we  impose  the  condition  R(2)E(2)  =  I,  that  is 

R(z)=E-1(2)  (4.11) 

then  the  system  reduces  to  Fig.  4.2(c)  which  is  a  perfect  reconstruction  (PR)  system  with  x(n)  =  ar(n).  Eqn. 
(4.11)  will  be  called  the  perfect  reconstruction  condition.  Notice  that  insertion  of  arbitrary  scale  factors  and 
delays  to  obtain  R(2)  =  cz-^E-'^iz)  leads  to  x(n)  =  ca;(n  -  2K)  which  is  still  the  PR  property. 

4.4.  The  Paraunitary  Perfect  Reconstruction  System 

It  was  pointed  out  in  [Vaidyanathan,  1987]  that  the  CQF  solution  has  a  mathematical  property  called  the 
paraunitary  property  and  this  observation  makes  it  possible  to  generalize  the  perfect  reconstruction  solution 
to  M-channel  filter  banks  for  arbitrary  M.  Since  then,  a  class  of  filter  banks  called  unitary  or  paraunitary 
filter  banks  has  been  developed.  The  paraunitary  property,  specialized  to  the  two  channel  case,  naturally 
yields  all  the  CQF  equations. 

Definition  4.1.  Paraunitary  matrices.  A  transfer  matrix^  H(2)  is  said  to  be  paraunitary  if  H(e^‘^) 
is  unitary  that  is,  Ht(e^'“)H(eJ“)  =  I,  for  all  w.  In  all  practical  designs  the  filters  are  rational  transfer 
functions  so  that  the  paraunitary  condition  implies  H(2)H(z)  =  I  for  all  2,  where  the  notation  H(2)  is 
explcdned  in  Sec.  1.  ^ 

We  sometimes  allow  a  positive  constant  o  >  0,  and  say  that  H(2)  is  paraunitary  if  H(z)H(2) 
al.  Proper  choice  of  a  simplifies  notations.  Note  that  H(z)  reduces  to  transpose  conjugation  Ht(e.^") 
on  the  unit  circle.  The  paraunitary  property  has  played  a  fundamental  role  in  electrical  network  theory 
[Brune,  1931],  [Belevitch,  1968]  and  has  a  rich  history  (see  references  in  Chap.  6  and  14  of  [Vaidyanathan, 
1993]).  Essentially,  the  scattering  matrices  of  lossless  (LC)  multiports  are  paraunitary,  that  is  unitary  on 
the  imaginary  axis  of  the  s-plane. 

Properties  of  Paraunitary  Filter  Banks 

A  filter  bank  in  which  E(2)  is  paraunitary  and  R(2)  =  E(z)  has  the  perfect  reconstruction  property 
x{n)  =  cx{n),c  ^  0.  We  will  now  study  the  properties  of  such  systems,  which  are  called  paraunitary  filter 

^  Transfer  matrices  are  essentially  transfer  functions  of  multi-input  multi-output  systems.  A  review  can 
be  found  in  Chap.  13  of  [Vaidyanathan,  1993]. 


banks.  All  of  these  properties  derive  from  a  fundamental  matrix  equation  which  we  first  derive.  By  replacing 
2  with  in  Eqn.  (4.9)  and  rearranging,  we  obtain 


Ga{z)  G,(-z) 
Haiz)  Hai-Z) 


(4.12) 


Similarly  for  the  synthesis  filters, 


■  Gs{z)  Hs{z)  1  _  r  1 

_Gs{-z)  Hsi-z)\ 


(4.13) 


Let  E(2)  and  R(2)  be  paraunitary  with  E(z)E(z)  =  0.51  and  R(z)R(z)  —  0.51.  Then  the  above  equations 
imply 

G,iz)Ga(z)  =  I,  G,(z)G,(.-)  =  I.  (4.14) 


That  is,  the  matrices  Ga(5)  and  Gs(2)  defined  above  are  paraunitary  as  well.  We  will  next  draw  a  number 
of  conclusions  from  here. 

A  word  on  language.  We  often  say  {Ga{z),Ha{z)}  is  paraunitary.  By  this  we  mean  that  the  corre¬ 
sponding  polyphase  matrix  is  paraunitary. 

Half-band  Property  and  Power  Symmetry  Property. 

The  paraunitary  property  Ga(z)Ga(z)  =  I  is  also  equivalent  to  Ga(z)Ga(2)  =  I  which  implies,  in 
particular,  the  equation 

Ga{z)Ga{z)  +  Gai  —  z)Ga{  —  z)  —  1.  (4-15) 


In  other  words,  Ga{z)  is  a  power  symmetric  filter  (Sec.  4.2).  Now,  a  transfer  function  G{z)  satisfying 
G{z)  -f  G(-z)  =  1  is  called  a  half-band  filter.  The  impulse  response  of  such  G{z)  satisfies  g{2n)  =  0  for 
all  n  0  and  ^(0)  =  0.5.  We  see  that  the  power  symmetry  property  of  Gaiz)  says  that  Ga{z)Ga{z)  is  a 
half-band  filter.  In  terms  of  frequency  response,  the  power  symmetry  property  of  Ga{z)  is  equivalent  to 


|Ga(e^‘")P  4-  |Ga(-e-''“)P  =  1. 


(4.16) 


Imagine  that  Ga{z)  is  a  real-coefficient  lowpass  filter  so  that  |Ga(e-'")P  has  symmetry  with  respect  to  zero 
frequency.  Then  |Ga(-e-^'“)P  is  as  demonstrated  in  Fig.  4.3,  and  the  power  symmetry  property  means  that 
the  two  plots  in  Fig.  4.3  add  up  to  unity.  In  this  figure,  Wp  and  uJs  are  the  bandedges,  and  6^  and  62  are  the 
peak  passband  ripples  of  Ga(eJ“)  (for  definitions  of  filter  specifications  see  [Oppenheim  and  Schafer,  1989] 
or  [Vaidyanathan,  1993]). 

36 


Fig.  4.3.  The  magnitude  response  |Ga(e-’“)p  and  |Ga(— e-’“)p  for  a 
real  coefficient  power  S3'mmetric  filter  Ga{z). 

Notice  in  particular  that  power  symmetry  of  Ga(r)  implies  that  there  is  a  symmetry  relation  between 
the  passband  and  stopband  specifications  of  Ga{c^‘^)-  This  relation  is  given  by 

Ws  =  ;r  -  Wp,  =  1  —  (1  -  26i)^.  (4-17) 

Relation  between  the  two  analysis  filters.  It  can  be  shown  that  the  property  Ga(2)Ga(z)  =  I 
implies  a  relation  between  the  analysis  filters  Ga{z)  and  ifa(z),  namely  Ha{z)  =  z^Ga{—z)  where  9  is 
arbitrary  and  N  is  an  arbitrary  but  odd  integer.  We  will  take  A'’  =  -1  and  ^  =  0  for  future  simplicity.  Then 
the  analysis  filters  are  related  as 

Ha{z)=^z-^G^{-z).  (4.18) 

In  particular  we  have  |ifa{e-'")|  =  |Ga(-e-'“)i-  Combining  with  the  power  symmetry  property  (4.16))  we  see 
that  the  two  analysis  filters  are  -power  complementary,  that  is, 

{G^ien?  +  \Ha{en\''  =  1  (4-19) 

for  all  w.  With  G^iz)  =  and  Ha(z)  =  K{n)z-’'  we  can  rewrite  (4.18)  in  the  time  domain 

as 

ha(n)  =  -(-l)"5-(-«+l)-  (4-20) 

Relation  between  analysis  and  synthesis  filters.  If  w'e  use  the  condition  R(2)  =  £(2)  in  the 
definitions  of  Gs(2)  and  Ga(2)  we  obtain  Gi(2)  =  Ga(r)  from  which  we  conclude  that  the  synthesis  filters 
are  given  by  Gs{z)  =  Ga{z)  and  Hs{z)  =  Ha(z).  We  can  also  rewrite  these  in  the  time  domain;  summarizing 
all  this  we  have 

Gs{z)  =  Gaiz),  Hs{z)  =  Ha{z),  5s(”)  =  5a(-«)>  ^^r(w)  =  (4-21) 
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The  synthesis  filter  cofficients  are  time-reversed  and  conjugated  versions  of  the  analysis  filters.  Their  fre¬ 
quency  response  are  conjugates  of  the  analysis  filter  responses.  In  particular  |Gs(e  )|  |Ga(c  )|  und 

In  view  of  the  preceding  relations,  the  synthesis  filters  have  all  the  properties  of  the  analysis  filters.  For 
example,  Gs(e-'“)  is  power  symmetric,  and  the  pair  {Gs(e^“),ifs(e'^")}  is  power  complementary.  Finally, 
Hsiz)  =  zGsi-z),  instead  of  (4.18). 

From  the  preceding  discussions  we  see  that  in  a  paraunitary  filter  bank  the  filter  Ga{z)  is  power  sym¬ 
metric,  and  the  remaining  filters  are  derived  from  Gaiz)  as  in  (4.18)  and  (4.21).  This  is  precisely  the  CQF 
solution  for  perfect  reconstruction,  stated  at  the  beginning  of  this  section.  The  observation  of  the  parauni- 
tary  condition  opens  up  several  advantages.  For  example  it  allows  us  to  generalize  the  perfect  reconstruction 
condition  for  M  channel  filter  banks.  It  also  allows  us  to  obtain  certain  cascaded  lattice  structures  which 
guarantee  the  perfect  reconstruction  property  in  spite  of  quantization  of  filter  coefficients.  These  details  can 
be  found  in  [Vaidyanathan,  1993]. 

Summary  of  Filter  Relations  in  a  Paraunitary  Filter  Bank 

If  the  filter  bank  of  Fig.  4.1(a)  is  paraunitary,  then  the  polyphase  matrices  E(z)  and  R{z)  (Fig.  4.2) 
satisfy  E(z)E(2)  =  0.51  and  R(2)R(z)  =  0.51.  Equivalently  the  filter  matrices  Ga]^)  and  Gs(z:)  satisfy 
Ga(z)Ga(2)  =  I  and  Gs(2)G5(z)  =  I.  A  number  of  properties  follow  from  these: 

1.  All  four  filters  Ga{z),  Ha{z),Gs{z)  and  Hs{z)  are  power  symmetric.  This  property  is  defined,  for  ex¬ 
ample,  by  the  relation  (4.15).  This  means  that  the  filters  are  spectral  factors  of  half  band  filters;  for 
example  Gs{z)Gsiz)  is  half-band. 

2.  The  analysis  filters  are  related  as  in  (4.18),  so  the  magnitude  responses  are  related  as  \Ha{&  )]  — 

The  synthesis  filters  are  time  reversed  conjugates  of  the  analysis  filters  as  shown  by  (4.21). 

In  particular  Gs(e-'^)  =  G*(e-'“)  and 

3.  The  analysis  filters  form  a  power  complementary  pair,  that  is  (4.19)  holds.  The  same  is  true  for  the 
synthesis  filters. 

4.  Any  two  channel  paraunitary  system  satisfies  the  CQF  equations  (4.6), (4. 7)  (except  for  delays,  constant 
scale  factors,  and  so  forth).  Conversely  any  CQF  design  is  a  paraunitary  filter  bank. 

5.  The  design  procedure  for  two  channel  paraunitary  (i.e.,  CQF)  filter  banks  is  as  follows:  design  a  zero- 
phase  lowpass  half-band  filter  G(z)  with  G(e-'")  >  0  and  then  extract  a  spectral  factor  Ga{z),  that  is  find 
Ga{z)  such  that  G(z)  =  Ga{z)Ga{z).  Then  choose  the  remaining  three  filters  as  in  (4.7),  or  equivalently 

as  in  (4.8). 
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4.5.  Parametrization  of  Paraunitary  Filter  Banks 

There  exist  factorization  theorems  for  paxaunitary  matrices  which  allow  us  to  express  the  polyphase  matrix 
as  a  cascade  of  elementary  paraunitaxy  blocks.  This  is  very  helpful  in  the  design  as  well  cus  implementation 
of  these  filter  banks.  In  this  section  we  will  demonstrate  the  idea  with  a  simple  factorization  theorem. 

Theorem  4.1.  Let  H(2)  =  be  a  2  x  2  real  causal  FIR  traxisfer  matrix  (so  h(n)  are 

2x2  matrices  with  real  elements).  This  is  paraunitarj"  if  and  only  if  it  can  be  expressed  as  H(2)  = 
RivA(2)RAr_i . . .  Ri  A(2)RoHo  where 

(4.22) 

where  a  and  6m  are  real.  •O 

For  a  proof  see  [Vaidyanathan,  1993].  The  matrix  R^  is  unitary.  It  is  called  a  rotation  operator  or  the 
Givens  rotation.  Ify  =  R„x  then  y  is  obtained  by  rotating  the  vector  x  clockwise  by  6m-  The  matrix  A(z)  is 
a  degree  one  paraunitary  system.  Fig.  4.4  shows  the  cascaded  structure  that  results  from  this  factorization. 
This  is  also  called  a  lattice  structure.  The  quantities  N  and  L  above  are  not  necessarily  equal.  For  example 
if  H(z)  =  z-^l  then  £  =  1  and  iV  =  2. 

“  COS  00  cos  cos  0|y| 


-sin  0^ 
sin  0N 


Fig.  4.4.  The  cascaded  lattice  structure  for  FIR  paxaunitary  systems. 

We  can  guarantee  the  paraunitary  property  by  using  the  cascaded  structure.  Thus  if  the  polyphase 
matrix  is  computed  using  the  cascaded  structure,  then  Ga{z)  is  guaranteed  to  be  power  symmetric,  and 
the  relation  Ha{z)  =  z~^Ga{-z)  between  the  analysis  filters  automatically  holds  (except  for  a  delay). 
Moreover  as  the  theorem  indicates,  the  cascaded  structure  covers  every  paraunitary  system  with  the  specified 
restrictions.  That  is,  all  two  channel  (causal,  real)  FIR  paraunitary  filter  banks  have  polyphase  matrix  of 
the  form  shown  in  the  figure.  In  particular,  the  real  coefficient  CQF  can  be  realized  in  this  manner. 

4.6.  Maximally  Flat  Solutions 

The  half  band  filter  G{z)  =  Ga(z)Ga(z}  can  be  designed  in  many  ways.  One  can  get  equiripple  designs  or 
maximally  flat  designs  [Oppenheim  and  Schafer,  1989).  An  early  technique  for  designing  FIR  maximally  flat 


COS  6m 

sin  6m 

1  0 

a  0 

Rm  = 

—  sin  6  m 

cos  6m 

,  A(2)  = 

0  2-1 

,  Ho  = 

0  ±a 
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filters  was  proposed  in  [Herrmann,  1971],  This  method  gives  closed  form  expressions  for  the  filter  coefficients 
and  can  be  easilj'  adapted  for  the  special  case  of  half  band  filters.  Moreover,  the  design  automatically 
guarantees  the  condition  G(e-^“)  >  0  (which  in  particular  implies  zero  phase). 

The  family  of  maximally  flat  half  band  filters  designed  by  Herrmann  is  demonstrated  in  Fig.  4.5. 


Fig.  4.5.  Maximally  flat  half-band  filter  responses  with  2K  zeros  at  tt. 

The  transfer  function  has  the  form 


n=0  ^  ' 


■"i  \  2n 


(4.23) 


The  filter  has  order  4/i  —  2.  There  are  2K  zeros  on  the  unit  circle  and  all  of  these  zeros  are  concentrated  at 
the  point  z  =  —1  (i.e.,  at  a;  =  tt).  The  remaining  2K  —  2  zeros  are  located  in  the  z-plane  such  that  G{z)  has 
the  half-band  property  described  earlier  (i.e.,  G{z)  +  G{—z)  =  1). 

We  will  see  later  (Sec.  13)  that  if  the  CQF  bank  is  designed  by  starting  from  Herrmann’s  maximally 
flat  half-band  filter,  then  it  can  be  used  to  design  continuous  time  wavelets  with  excellent  regularity  (i.e., 
smoothness)  properties. 


4.7.  Tree  Structured  Filter  Banks 

The  idea  of  splitting  a  signal  x{n)  into  two  subbands  can  be  extended  by  splitting  a  subband  signal  further, 
as  demonstrated  in  Fig.  4.6(a).  In  this  example  the  lowpass  subband  is  split  over  and  over  again.  This  is 
called  a  tree  structured  filter  bank.  Each  node  of  the  tree  is  a  two-channel  aiicJysis  filter  bank.  There  are 
several  variations  of  this  scheme;  for  example  we  can  choose  to  have  differnt  filter  pairs  at  different  levels 
of  the  tree.  We  can  also  choose  to  split  the  high  pass  subband;  and  we  can  replace  the  two  channel  system 
with  a  more  general  {M  channel)  system  at  each  node  of  the  tree. 
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Fig.  4.6.  Tree-structured  filter  banks  (a)  analysis  bank,  and  (b)  synthesis  bank. 


The  synthesis  bank  corresponding  to  Fig.  4.6(a)  is  shown  in  Fig.  4.6(b).  We  combine  the  signals  in 
pairs,  in  the  same  manner  that  we  split  them.  It  can  be  shown  that  if  {Ga{z),Ha{z),Gs{z)-,  Hsi^)}  is  a 
perfect  reconstruction  system  [i.e.,  satisfies  J(n.)  =  x{n)  when  connected  in  the  form  Fig.  4.1(a)]  then  the 
tree  structured  analysis/synthesis  system  of  Fig.  4.6  has  perfect  reconstruction  x{n)  =  x{n). 

The  tree  structured  system  can  be  redrawn  in  the  form  shown  in  Fig.  4.7.  For  example  if  we  have  a  tree 
structure  like  Fig.  4.6  with  three  levels,  we  have  M  =  4,  no  =  2,  nj  =  4,  n2  =8  and  =  8.  If  we  assume 
that  the  responses  of  the  analysis  filters  Ga(e^'“)  and  are  as  in  Fig.  4.8(a)  then  the  responses  of 

the  analysis  filters  are  as  shown  in  Fig.  4.8(b).  Note  that  this  resembles  the  wavelet  transform 
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[Fig.  2.8(b)].  The  outputs  of  different  filters  are  subsampled  at  different  rates  exactly  as  for  wavelets.  Thus 
the  tree  structured  filter  bank  has  close  relation  to  the  wavelet  transform.  In  Sec.  10-13  we  will  present 
the  precise  mathematical  connection  between  the  two.  We  will  also  see  that  tree  structured  filter  banks  are 
closely  related  to  multiresolution  analysis  (Sec.  10). 


0  7t/4  ;c/2  Jt 

Fig.  4.8.  An  example  of  responses  (a)  Ga{z)  and  Ha{z),  and 
(b)  Tree-structured  analysis  bank. 

4.8.  Filter  Banks  and  Basis  Functions 

Consider  again  Fig.  4.7.  Assuming  perfect  reconstruction  we  have  £(n)  =  x{n)  which  means  we  can  express 


x{n)  in  terms  of  the  decimated  subband  signals  yk{n)  and  the  impulse  responses  fk{n)  of  the  filters  Fk{z). 
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This  expression  has  the  form 


(4.24) 


Af-l  oo 

x(n)=Y^  yk{m)fk{n-nkm). 

k=0  m=-oo 

This  system  is  analogous  to  the  filter  bank  systems  which  represented  the  continuous  time  STFT  and 
wavelet  transforms  in  Sec.  2  and  3.  Thus  the  collection  of  subband  signals  j/fc(ru)  can  be  regarded  as  a 
time-frequency  representation  for  x(n)  which,  in  this  section,  is  a  discrete  time  signal.  As  in  Sec.  2.7, 
k  denotes  the  “frequency  index”  and  m  the  “time  index”  in  the  transform  domain.  If  we  have  a  perfect 
reconstruction  filter  bank  we  can  recover  x{n)  from  this  time-frequency  representation  using  (4.24).  The 
doubly  indexed  family  of  discrete  time  sequences  {»?fcm(«)}  where  r]kmin)  =  fk{n-  Ukm)  can  be  regarded  as 
“basis  functions”  for  the  representation  of  x{n)  in  terms  of  the  time-frequency  coefficients  r]km{n). 

To  make  things  mathematically  more  accurate  let  us  say  that  x{n)  €  (i.e.,  k(”)P  is  finite).  If 

the  two  channel  filter  bank  {G^{z),Ha{z),Gs{z),Hs[z))  which  makes  up  the  tree  structure  of  Fig.  4.6  is 
paraunitary,  it  can  be  shown  that  rjkmin)  is  an  orthonormal  basis  for  Orthonormality  means 

OO 

Y  -  m2).  (4.25) 

n=— OO 

While  (4.24)  resembles  the  STFT  and  wavelet  representations  developed  in  Sec.  2  and  3,  there  are  similarities 
and  differences.  First,  it  is  in  discrete  time,  and  second  the  basis  functions  (sequences)  are  not  derived  from 
a  single  function.  By  contrast  a  wavelet  basis  {2'-’/2^(2*f  -  n)}  is  derived  from  a  single  wavelet  function 
■>p{t).  We  say  that  {;?fcm(«)}  is  a  filter-bank  type  of  basis  for  the  space  of  sequences.  The  basis  is  a  doubly 
indexed  infinite  family  of  sequences,  derived  from  a,  finite  number  of  filters  {/fc(n  )}  by  time-shifts  of  a  specific 
form.  The  filter-bank  type  basis  is  orthonormal  if  {Gs{z),  Hs(z)}  is  paraunitary  [Soman  and  Vaidyanathan, 
1993]. 

5.  DEEPER  STUDY  OF  WAVELETS.  FILTERBANKS,  AND  STFT 

From  Sec.  2  and  3,  we  already  know  what  the  wavelet  transform  is  and  how  it  compares  with  the  short 
time  Fourier  transform,  at  least  qualitatively.  We  are  also  familiar  with  time-frequency  representations  and 
digital  filter  banks.  It  is  time  now  to  fill  several  important  details,  and  generally  be  more  quantitative.  For 
example,  we  would  like  to  mention  some  major  technical  limitations  of  the  STFT  which  are  not  obvious  from 
its  definition,  and  explain  that  wavelets  do  not  have  this  limitation. 

For  example,  we  will  see  that  if  the  STFT  is  used  to  obtain  an  orthonormal  basis  for  signals,  then 
the  time-frequency  rms  durations  of  the  window  v(t)  (defined  in  Sec.  3.1)  will  satisfy  DtDj  =  00.  That  is, 
either  the  time  or  the  frequency  resolution  is  very  poor  (Theorem  9.1).  It  also  turns  out  that  if  we  have  an 
STFT  system  where  the  time-frequency  sampling  product  UgTs  is  small  enough  to  admit  redundancy  (i.e.. 
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the  vectors  are  not  linearly  independent  as  they  would  be  in  an  orthonormal  basis)  then  the  above  difficulty 
can  be  eliminated  (Sec.  9). 

The  Gabor  transform  described  in  Section  3.1,  while  admittedly  a  tempting  candidate  because  of  the 
optimal  time-frequency  resolution  property  {DtDf  minimized),  has  a  disadvantage.  Namely,  if  we  want 
to  recover  the  signal  x(t)  from  the  STFT  coefficients,  the  reconstruction  is  unstable  in  the  so-called  criti¬ 
cally  sampled  case  (Sec.  9).  That  is,  a  small  error  in  the  STFT  coefficients  can  lead  to  a  large  error  in 
reconstruction. 

The  wavelet  transform  does  not  suffer  from  the  above  limitations  of  the  STFT.  We  will  show  how  to 
construct  orthonormal  wavelet  bases  with  good  time  and  frequency  resolutions  (Sec.  11-13).  We  will  also 
show  that  we  can  start  from  a  paraunitary  digital  filter  bank  and  construct  orthonormal  wavelet  bases  for 
L^{R)  in  a  very  systematic  way  (Theorem  11.5).  Moreover  this  can  be  done  in  such  a  way  that  many  desired 
properties  (e.g.,  compact  support,  orthonormality,  good  time  frequency  resolution,  smoothness,  and  so  forth) 
can  be  incorporated  during  the  construction  (Sec.  13).  Such  a  construction  is  placed  in  evidence  by  the 
theory  of  multiresolution,  which  gives  a  unified  platform  for  wavelet  construction  and  filter  banks  (Theorems 
10.1,10.2). 

At  this  point  in  time,  the  reader  may  want  to  preview  the  above  mentioned  theorems  in  the  future 
sections,  in  order  to  get  a  flavor  of  things  to  come.  However,  to  explain  these  results  in  a  quantitative  way, 
it  is  very  convenient  to  review  a  number  of  mathematical  tools.  The  need  for  advanced  tools  arises  because 
of  the  intricacies  associated  with  basis  functions  for  infinite  dimensional  spaces  i.e.,  spaces  where  the  set 
of  basis  functions  is  an  infinite  set.  (For  finite  dimensional  spaces  an  understanding  of  elementary  matrix 
theory  would  have  been  sufficient.)  For  example  a  representation  of  the  form  x(t)  =  X)  Cnfn{t)  in  an  infinite 
dimensional  space  could  be  unstable  in  the  sense  that  a  small  error  in  the  transform  domain  {c„}  could  get 
amplified  in  an  unbounded  manner  during  reconstruction.  We  will  talk  about  a  special  type  of  basis  called 
the  Riesz  basis  which  does  not  have  this  problem  (orthonormal  bases  are  special  cases  of  these).  We  will  also 
talk  about  frames  which  share  many  good  properties  of  the  Riesz  bases  but  may  have  redundant  vectors  (i.e., 
not  a  linearly  independent  set  of  vectors).  For  example,  the  concept  of  frames  will  arise  in  the  comparison  of 
wavelets  and  the  STFT.  We  will  see  that  general  STFT  frames  have  an  advantage  over  STFT  bases.  Frames 
also  come  into  consideration  when  we  explain  the  connection  between  wavelets  and  paraunitary  digital  filter 
banks  in  Sec.  11.4.  When  explaining  the  connection  between  wavelets  and  non  unitary  filter  banks,  one 
again  encounters  Riesz  bases  and  the  idea  of  biorthogonality. 

Since  it  is  difficult  to  find  all  the  mathematical  background  material  in  one  place,  we  have  tried  to  review 
a  carefully  selected  set  of  topics  in  the  next  few  sections.  These  are  very  useful  for  a  deeper  understanding 
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of  wavelets  and  STFT.  The  material  in  Sec.  6  is  fairly  standard  (Lebesgue  integrals,  spaces,  V-  and 
Fourier  transforms).  The  material  in  Sec.  7  and  8  (Riesz  bases  and  frames)  are  less  commonly  known  among 
engineers  but  play  a  significant  role  in  wavelet  theory.  The  reader  may  want  to  go  through  these  review 
sections  6-8  (admittedly  dense),  once  during  first  reading  and  then  use  them  primarily  as  a  reference.  After 
this  review  we  will  return  to  our  discussions  of  wavelets,  STFT  and  filter  banks. 

6.  THE  SPACE  OF  and  SIGNALS 

We  developed  the  wavelet  representation  in  Sec.  2.1  based  on  the  framework  of  a  bank  of  bandpass  filters. 
To  make  everything  mathematically  meaningful  it  becomes  necessary  to  carefully  specify  the  types  of  signals, 
types  of  Fourier  transforms,  and  so  forth.  For  example,  as  engineers,  the  concept  of  ideal  bandpass  filtering 
is  appealing  to  us,  but  there  arises  a  difficulty.  An  ideal  bandpass  filter  H{lu)  is  not  stable,  that  is  f  \h{t)\dt 
does  not  exist  [Oppenheim  and  Schafer,  1989].  In  other  words  h{t)  does  not  belong  to  the  space  (see 
below). 

Why  shonld  this  bother  us  if  we  are  only  discussing  theory?  Take  an  example.  The  frequency  domain 
developments  based  on  Fig.  2.7,  which  finally  give  rise  to  the  time  domain  expression  (2.8)  implicitly  rely  on 
the  convolution  theorem  (which  says  that  convolution  in  time  implies  multiplication  in  frequency).  However, 
the  convolution  theorem  is  typically  proved  only  for  signals  and  bounded  signals.  It  is  not  valid 
for  arbitrary  signals.  We  therefore  need  to  be  careful  when  using  these  familiar  engineering  notions  in  a 
mathematical  discussion. 

6.1.  Lebesgue  integrals 

In  most  engineering  discussions  we  think  of  the  integrals  as  Riemann  integrals.  But  in  order  to  handle  several 
convergence  questions  in  the  development  of  Fourier  series,  convolution  theorems,  wavelet  transforms  and  so 
forth,  it  is  necessary  to  use  Lebesgue  integration.  There  are  many  beautiful  results  for  Lebesgue  integration 
which  are  not  true  for  the  Riemann  integral  under  comparable  assumptions  about  signals.  This  includes 
theorems  that  allow  us  to  interchange  limits,  integrals,  and  infinite  sums  freely. 

In  this  chapter  all  integrals  are  Lebesgue  integrals.  A  review  of  Lebesgue  integration  is  beyond  the  scope 
of  this  chapter.  There  are  many  excellent  references,  for  example  [Kolmogorov  and  Fomin,  1970],  [Haaser 
and  Sullivan,  1971],  and  [Apostol,  1974].  A  few  elementary  comparisons  between  Riemann  and  Lebesgue 
integrals  are  pointed  out  below. 

1.  If  x{t)  is  Riemann  integrable  on  a  bounded  interval  [o,6]  then  it  is  also  Lebesgue  integrable  on  [a,  6]. 
But  the  converse  is  not  true.  For  example  if  we  define  x{t)  =  -1  for  all  rationals  and  x{t)  -  1  for 


all  irrationals  in  [0, 1]  then  x{t)  is  not  Riemann  integrable  in  [0, 1].  But  it  is  Lebesgue  integrable,  and 

Jo  =  1- 

2.  A  similar  statement  is  not  true  for  the  unbounded  interval  (—00,00).  For  the  unbounded  interval 
(—00,00)  the  Riemann  integral  is  defined  only  as  a  limit  called  the  improper  integralJ  Consider  the 
sine  function  defined  as;  s(t)  =  sint/t  for  t  ^  0,  and  s(0)  =  1.  This  has  improper  Riemann  integral  tt 
but  is  not  Lebesgue  integrable. 

3.  If  x{t)  is  Lebesgue  integrable  then  so  is  \x(t)\.  The  same  is  not  true  for  Riemann  integrals,  as  demon¬ 
strated  by  the  sine  function  s(t)  of  the  preceding  paragraph. 

4.  If  |a;(t)|  is  Lebesgue  integrable  then  so  is  x{t)  as  long  as  it  is  measurable. ^  This  however,  is  not  true  for 
Riemann  integrals.  For  example  if  we  define  x{t)  =  —  1  for  all  rationals  and  1  for  all  irrationals  in  [0, 1] 
then  it  is  not  Riemann  integrable  in  [0,1]  though  |a;(t)|  is. 

5.  If  a;(t)  is  (measurable  and)  bounded  by  a  nonnegative  Lebesgue  integrable  function  g{t)  [i.e.,  \x{t)  j  <  g{t)] 
then  x{t)  is  Lebesgue  integrable. 

Sets  of  Measure  Zero 

A  subset  <5  of  real  numbers  is  said  to  have  measure  zero  if,  given  e  >  0  we  can  find  a  countable  union 
Uili  of  open  intervals  /;  [intervals  of  the  form  {ai,bi),  i.e.,  ai  <  x  <  6,]  such  that  (i)  S  C  liiU  and  (ii)  the 
total  length  of  the  intervals  <  e.  For  example  the  set  of  all  integers  (in  fact  any  countable  set  of  real  numbers, 
e.g.,  rationals)  has  measure  zero.  There  exist  uncountable  sets  of  real  numbers  which  have  measure  zero,  a 
famous  example  being  the  Cantor  set  [Apostol,  1974]. 

When  we  say  that  something  is  true  “almost  everwhere”  (abbreviated  a.e.),  or  “for  almost  all  t"  it  means 
that  the  statement  holds  everywhere  except  possibly  on  a  set  of  measure  zero.  For  example  if  x[t)  =  y{t) 
everywhere  except  for  integer  values  of  t,  then  x{t)  =  y{t)  a.e.  An  important  fact  in  Lebesgue  integration 
theory  is  that  if  two  Lebesgue  integrable  functions  are  equal  a.e.,  then  their  integrals  are  equal.  In  particular 
if  x{t)  =  0  a.e.,  then  the  Lebesgue  integral  /  x{t)dt  exists  and  is  equal  to  zero. 


t  Essentially  we  consider  jj^  x{t)dt  and  let  a  and  fe  go  to  00  separately.  This  limit,  the  improper  Riemann 

integral,  should  not  be  confused  with  the  Cauchy  principal  value  which  is  the  limit  of  f_^  x{t)dt  as  a  ^  oc. 

The  function  x{t)  =  t  has  Cauchy  principal  value  =  0,  but  the  improper  Riemann  integral  does  not  exist, 
i  The  notion  of  a  measurable  function  is  very  subtle.  Any  continuous  function  is  measurable,  and  any 

Lebesgue  integrable  function  is  measurable.  In  fact,  examples  of  non  measureable  functions  are  so  rare  and 

so  hard  to  construct,  that  there  is  practically  no  danger  we  will  run  into  one.  We  take  measurability  for 

granted  and  never  mention  it. 
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Convergence  Theorems 

What  makes  the  Lebesgue  integral  so  convenient  is  the  existence  of  some  powerful  theorems,  which 
allow  us  to  interchange  limits  with  integrals  and  summations,  under  very  mild  conditions.  These  theorems 
have  been  at  the  center  of  many  beautiful  results  in  Fourier  and  wavelet  transform  theory. 

Let  {5fc(t)},  1  <  fc  <  00  be  a  sequence  of  Lebesgue  integrable  functions.  In  general  this  sequence  may 
not  have  a  limit,  and  even  if  it  did,  the  limit  may  not  be  integrable.  Under  some  further  mild  postulates, 
we  can  talk  about  limits  and  their  integrals.  In  what  follows  we  often  say  “5i(t)  is  a  pointwise  limit  a.e.  of 
the  sequence  or  converges  to  g{t)  a.e.”  This  means  that  for  any  chosen  value  of  t  (except 

possibly  in  a  set  of  measure  zero),  we  have  Qkit)  9{t)  as  A:  -+  oo. 

Monotone  convergence  theorem.  Suppose  (a)  {gk{t)}  is  non  decreasing  a.e.  (i.e.,  for  almost  all 
values  of  f,  gk{t)  is  non  decreasing  in  k)  and  (b)  J gk{t)dt  is  a  hounded  sequence.  Then  {gk{t)}  converges 

а. e.  to  a  Lebesgue  integrable  function  g{t)  and  \imk  J  gk{t)dt  —  /limt  gk{t)dt,  i.e.,  limfc  J  gk(t)dt  =  J  g{t)dt. 
That  is,  we  can  interchange  the  limit  with  the  integral. 

Dominated  convergence  theorem.  Suppose  (a)  {5fc(t)}  is  dominated  by  a  nonnegative  Lebesgue 
integrable  function  f{t)  i.e.,  |5fc(t)l  <  f{t)  a.e.,  and  (b)  {gk{t)}  converges  to  a  limit  g(t)  a.e.  Then  the  limit 
g{t)  is  Lebesgue  integrable  and  limt  f  gk(t)dt  =  flimkgk(t)dt,  i.e.,  lim*..  =  J g{t)dt.  That  is,  we  can 

interchange  the  limit  with  the  integral. 

Levi’s  theorem.  Suppose  jY,'k=i  \9k{t)\dt  is  a  bounded  sequence  in  m.  Then  /E^i  = 

EfcLi  ! 9k{t)dt.  In  particular  this  means  that  9k{t)  converges  a.e.  to  a  Lebesgue  integrable  function. 
This  theorem  permits  us  to  interchange  infinite  sums  with  the  integrals. 

Fatou’s  Lemma.  Let  (a)  gk{t)  >  0  a.e.,  (b)  gk{t)  g{t)  a.e.,  and  (c)  J gk{t)dt  <  A  for  some 
0  <  A  <  00.  Then  the  limit  g{t)  is  Lebesgue  integrable  and  /  g{t)  <  A.  (There  exist  stronger  versions  of 
this  result  [Rudin,  1966],  but  we  shall  not  require  them  here.) 

б. 2.  signals 

Let  p  be  an  integer  such  that  1  <  p  <  oo.  A  signal  x{t)  is  said  to  be  an  signal  if  (it  is  measurable,  and) 
/  \x{t)\Pdt  exists.  We  define  the  LP  norm  of  x{t)  as  |la;(t)l|p  =  [/  \x{t)\PdtY/P.  For  fixed  p  the  set  of  LP  signals 
forms  a  vector  space.  It  is  a  normed  linear  vector  space,  with  norm  defined  as  above.  The  term  “linear” 
means  that  if  x{t)  and  y{t)  are  in  LP,  then  ax{t)  +  fiy(t)  is  also  in  Lp  for  any  complex  a  and  /?. 

Since  any  two  signals  x{t)  and  y{t)  that  are  equal  a.e.  cannot  be  distinguished  (i.e.,  \\x{t)  -  i/(f)ll  =  0), 
each  element  in  LP  is  in  reality  “a  set  of  functions  that  are  equal  a.e”.  Each  such  set  becomes  an  “equivalence 
class”  in  mathematical  language. 


47 


For  p  =  2  the  quantity  ||a;(t)||p  is  equal  to  the  energy  of  x(t),  as  defined  in  signal  processing  texts.  Thus 
an  signal  is  a  finite-energy  (or  square-integrable)  signal.  For  p  =  oo  the  above  definitions  do  not  make 
sense,  and  we  simply  define  to  be  the  space  of  essentially  bounded  signals.  A  signal  x{t)  is  said  to  be 
essentially  bounded  if  there  is  a  number  B  <  oo  such  that  |x(t)l  <  B  a.e.  We  often  omit  the  term  “essential” 
for  simplicity;  it  arises  because  of  the  “a.e.”  in  the  inequality.  The  norm  ||rE(t)|loo  is  taken  as  essential 
supremum  of  |a;(t)|  over  all  t.  That  is,  |la;(t)|loo  is  the  smallest  number  such  that  \x{t)\  <  |la;(t)|U  a.e. 

For  us  and  L°°  functions  are  particularly  interesting.  Note  that  neither  nor  contains  the 

other.  However  bounded  functions  are  in  L^,  and  functions  on  bounded  intervals  are  in  .  That  is, 

L^nL°°  C  and  T^[a,  6]  C  [a,  6].  (6.1) 

Thus  is  already  bigger  than  bounded  functions.  Moreover, 

x{t)  €L^nL^  ^  x{t)  €  LP  for  all  p  >  1. 

This  follows  because  |x(t)|P  <  |x(f)|  ||x(t)|lPj^  Thus,  lx(t)lP  is  (measurable  and)  bounded  by  a  Lebesgue 
integrable  function  (since  |x(f)|  is  integrable),  and  is  therefore  integrable. 

Orthonormal  signals  in 

The  inner  product  {x{t),  y{t))  =  /  x{t)y'(t)dt  always  exists  for  any  x{t)  and  y{t)  in  L^.  Thus  the  product 
of  two  functions  is  an  function.  If  {x{t),y{t))  =  0  we  say  that  x{t)  and  y{t)  are  orthogonal.  Clearly 
||a;(t)||^  =  (x{t),x{t)).  Consider  a  sequence  {gn{t)}  of  signals  such  that  any  pair  of  these  are  orthogonal, 
and  |lpn(0il2  =  1  for  all  n.  This  is  said  to  be  an  orthonormal  sequence.  The  following  two  results  are 
fundamental. 

Theorem  6.1.  Let  {p„(t)},  1  <  n  <  oo  be  an  orthonormal  sequence  in  .  Define  c„  =  (x(t),p„(f))  for 
some  x{t)  e  L^.  Then  the  sum  converges,  and  l^nP  <  lk(f)lP- 

Theorem  6.2.  (Riesz-Fischer  theorem).  Let  {(?„(*)},  1  <  n  <  oo  be  an  orthonormal  sequence  in 
and  let  {c„}  be  a  sequence  of  complex  numbers  such  that  PnP  converges.  Then  there  exists  x{t)  e 
such  that  Cn  =  (x(t),5„(t)),  and  x(t)  =  Cn5n(i)  (with  equality  interpreted  in  the  sense,  see  below).  ^ 

The  space  is  more  convenient  to  work  with,  than  LL  For  example  the  inner  product  and  the  concept 
of  orthonormality  are  undefined  in  LL  Moreover,  as  we  shall  see  Sec.  6.3,  the  Fourier  transform  m  has 
more  time-frequency  symmetry  than  in  .  In  Sec.  7.3  we  will  define  unconditional  bases,  which  have  the 
property  that  any  rearrangement  continues  to  be  a  basis.  It  turns  out  that  any  orthonormal  basis  in  L  is 
unconditional,  whereas  the  space  does  not  even  have  an  unconditional  basis! 
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Equality  and  Convergence  in  Sense 

Let  x{t)  and  y{t)  be  Lp  functions  (p  <  oo).  Then  ||x(f)  -  p(t)||p  =  0  if  and  only  if  x{t)  =  y{t)  a.e.  Thus 
if  x{t)  and  y{t)  differ  only  for  every  rational  t  we  still  have  ||x(t)  -  t/(t)||p  =  0.  Whenever  \\x{t)  -  2/(t)llp  =  0, 
we  say  that  x(^t'^  ~  I/i^)  sense.  Now  consider  a  statement  of  the  form 

OO 

x{t)-'Y^c„gn{t)  (6-2) 

n=l 

for  p  <  oo,  where  p„(t)  and  x{t)  are  in  L^.  This  means  that  the  sum  converges  to  x{t)  in  the  LP  sense,  that 
is  ||a;(t)  -  23^-1  c„gnit)\\p  goes  to  zero  as  iV  ->  oo.  If  we  modify  the  limit  x{t)  by  adding  some  number  to 
x{t)  for  all  rational  t,  the  result  is  still  a  limit  of  Cnffn(t)  in  the  LP  sense!  LP  limits  are  unique  only  in 

the  a.e.  sense.  We  omit  the  phrase  “in  the  Lp  sense”  whenever  it  is  clear  from  the  context. 

The  P  Spaces 

Let  p  be  an  integer  with  1  ^  P  ^  The  collection  of  all  sequences  x{ti)  such  that 
converges  to  a  finite  value  is  denoted  F.  This  is  a  linear  space  with  norm  ||a;(Ti)|l  defined  such  that 
\\x{n)\\  =  (En  Unlike  LP  spaces,  the  F  spaces  satisfy  the  following  inclusion  rule: 

C  C  C  . . .  (^•^) 


The  spaces  and  are  especially  interesting  in  circuits  and  signal  processing.  If  h{n)  €  then  En  l^(”')l  < 
00.  This  is  precisely  the  condition  for  the  BIBO  (bounded-input  bounded-output)  stability  of  a  linear  time 
invariant  system  with  impulse  response  h{n)  [Oppenheim,  et  al.,  1983]. 

Continuity  of  Inner  Products 

If  {3^n(0}  ^  sequence  in  and  has  an  limit  then  for  any  y{t)  E  L  , 


liTi^(^x„{t),y{t)^  =  Qi^x„{t),y{t)'^  =  (^x{t),y{t)'j 


with  the  second  limit  interpreted  m  the  L^  sense.  Thus,  limits  can  be  interchanged  with  inner  prod 
net  signs.  Similarly  infinite  summation  signs  can  be  interchanged  with  the  inner  product  sign,  that  is, 
T.7=i{<^nXn{t),y{t))  =  (E~=i  OinXnit),  p(t)),  provided  the  second  summation  is  regarded  as  an  limit. 
These  follow  from  the  fundamental  property  that  inner  products  are  continuous  [Rudin,  1966]. 

Next  suppose  {x„{t)}  is  a  sequence  of  functions  in  LP  for  some  integer  p  >  1  and  suppose  Xn{t)  x{t) 
in  the  LP  sense.  Then  ||a:„(t)l|p  ||a;(t)|lp  as  well.  We  can  rephrase  this  as 

hm  ||x„(t)|lp  =  11  lim  x„(t)||p  =  ||x(f)||p  (6.5) 
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Thus  the  limit  sign  can  be  interchanged  with  the  norm  sign,  where  the  limit  in  the  second  expression  is  in 
the  LP  sense.  This  follows  because  |  ||a;„(t)||p  -  ||x(t)||p  |  <  ||a:„(t)  -  x(t)||p  0  as  n  oo. 

6.3.  Fourier  transforms 

The  Fourier  transform  (FT)  is  defined  for  signals  and  signals  in  different  ways,  t  The  properties  of 
these  two  types  of  FT  are  significantly  different.  In  the  signal  processing  literature,  where  we  ultimately  seek 
engineering  solutions  (such  as  filter  approximation  with  rational  transfer  functions),  this  distinction  often  is 
not  necessary.  But  when  we  try  to  establish  that  a  certain  set  of  signals  is  a  basis  for  a  certain  class,  we  have 
to  be  careful,  especially  if  we  use  tools  such  as  the  FT,  convolution  theorem,  and  so  forth  (as  we  implicitly 
did  in  Sec.  2).  Detailed  references  for  this  section  include  Rudin  [1966],  Apostol  [1974],  and  Chui  [1992a]. 

The  Fourier  Transform. 

Given  a  signal  x{t)  €  its  Fourier  transform  X{uj)  (the  FT)  is  defined  in  the  way  familiar  to 
engineers: 

/OO 

x{t)e-^‘^^dt  (6.6) 

•CO 

The  existence  of  this  integral  is  assured  by  the  fact  that  x{t)  is  in  In  fact  the  above  integral  exists  if 
and  only  if  x{t)  €  L^.  The  Fourier  transform  has  the  following  properties: 

1.  A'(a;)  is  a  continuous  function  of  w. 

2.  X{oj)  ^  0  as  |a;|  ^  oo.  This  is  called  the  Riemann-Lebesgue  Lemma. 

3.  X{ij)  is  bounded,  and  lA'(a;)|  <  ||2:(t)|li. 

In  engineering  applications  we  often  draw  the  ideal  lowpass  filter  response  (F’(w)  in  Fig.  2.3)  and  consider 
it  as  the  Fourier  transform  of  the  impulse  response  f{t).  But  this  frequency  response  is  discontinuous  and 
already  violates  Property  1.  This  is  because  /(t)  is  not  in  and  F{uj)  is  not  the  L  -FT  of  f{t).  That  /(t) 
is  not  in  is  consistent  with  the  fact  that  the  ideal  filter  is  not  BIBO  stable  (i.e.,  a  bounded  input  may 
not  produce  bounded  output,  since  /  \f{t)\dt  is  not  finite). 

The  inverse  Fourier  transform.  In  general  the  FT  X{uj)  of  an  signal  is  not  in  L^.  Example:  if 
a;(t)  is  the  rectangular  pulse,  then  A'(w)  is  the  sine  function  which  is  not  absolutely  integrable.  Thus  the 
familiar  inverse  transform  formula 

x(t)  =  /  A'(w)e^'“‘du;  (6.7) 

2^  J  —  oo 

t  For  more  general  signals  the  FT  can  sometimes  be  defined  in  the  “distribution  sense”;  see  Appendix  A. 
i  Since  x{t)  is  Lebesgue  integrable  (hence  measurable)  the  product  x{t)e~^'^^  is  measurable,  and  it  is 

bounded  by  the  integrable  function  |a-(t)|.  So  x{t)e~^‘^*  is  integrable. 
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does  not  make  sense  in  general.  However  since  X(w)  is  continuous  and  bounded,  it  is  integrable  on  any 
bounded  interval,  so  A'(w)e''"*du)/27r  exists  for  any  finite  c.  This  quantity  may  even  have  a  limit  as 
c  00,  even  if  the  Lebesgue  integral  or  improper  Rieman  integral,  does  not  exist.  Such  a  limit  (the  Cauchy 
principal  value)  does  represent  the  original  function  x(t)  under  some  conditions. 

Case  1.  Thus,  suppose  x{t)  €  and  suppose  that  it  is  of  bounded  variation  in  an  interval  [a,  &],  that 
is,  it  can  be  expressed  as  the  difference  of  two  nondecreasing  functions  [Apostol,  1974].  Then  we  can  show 
that  the  above  Cauchy  principal  value  exists,  and 

x{t+)  +  x{t  )  ^  J_  r  xioj)e^^*duj  (6.8) 

2  c— ‘oo  27r  y_(, 

for  every  t  G  (a,  6).  The  notations  x{t~)  and  x(t+)  are  the  left  hand  limit  and  right  hand  limit,  respectively, 
of  x{-)  at  t;  for  functions  of  bounded  variation,  these  limits  can  be  shown  to  exist.  If  a;(-)  is  continuous  at  t, 
then  x{t~)  =  x{t+)  =  x{t)  and  the  above  reduces  to  the  familiar  inversion  formula. 

Case  2.  Suppose  now  that  x{t)  €  and  X{qj)  G  T*  as  well.  Then  the  integral  y{t)=  X(w)e-'"‘dw/27r 

exists  as  a  Lebesgue  integral,  and  y{t)  =  x{t)  almost  everywhere  [Rudin,  1966].  In  particular,  if  x{-)  is 
continuous  at  t  then  x(t)  =  X(w)e'^"‘da;/27r. 

It  turns  out  that  if  x[t)  and  A^(w)  are  both  in  then  they  are  both  in  as  well.  This  is  shown  as 
follows;  since  x{t)  G  implies  that  .Y(a;)  is  bounded,  we  see  that  X{ui)  G  fl  1°®.  So  X^io)  G  for  all 
integer  p  (Sec.  6.2).  In  particular  A'(w)  G  so  x{t)  G  1?  as  well  (by  Parseval’s  relation,  see  below). 

The  V'  Fourier  Transform 

The  l}  Fourier  transform  lacks  the  convenient  property  of  time-frequency  symmtery.  For  example, 
even  though  x{t)  is  in  X{ij)  may  not  be  in  LL  Also  even  though  x{t)  may  not  be  contiuous,  A'(w)  is 
necessarily  continuous.  The  space  is  much  easier  to  work  with.  Not  only  can  we  talk  about  inner  products 
and  orthonormal  bases,  there  is  also  perfect  symmetry  between  time  and  frequency  domains,  as  we  shall  see. 
We  need  to  define  the  L^-FT  differently  because  the  ususal  definition  (6.6)  is  meaningful  only  for  signals. 
Suppose  x{t)  G  and  we  truncate  it  to  the  interval  [-n,  n].  This  truncated  version  is  in  because  of  (6.1), 
and  its  Fourier  transform  exists: 

A„(a;)  =  r  x{t)e-^'^Ut  (6.9) 

J  —n 

It  can  be  shown  that  A’„(a;)  is  in  LP'  and  that  the  sequence  {X„(w)}  has  a  limit  in  L^.  That  is,  there  exists 
an  function  A'(a;)  such  that 

lim  ||A„(tj)  —  A'(u;)||2  =  0  (6.10) 

n—^oo 

This  limit  A(u;)  is  defined  to  be  the  Fourier  transform  of  x{t).  Some  of  the  properties  are  listed  next. 
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1.  X{ljj)  is  in  and  we  can  compute  x{t)  from  X{u>)  in  an  entirely  analogous  manner,  namely  the  L 
limit  of /"„A'(a;)e^“'du;/27r. 

2.  If  x{t)  is  in  and  L^,  then  the  above  computation  gives  the  same  answer  as  the  FT  (6.6)  a.e.  For 

example  consider  the  rectangular  pulse  x{t)  =  1  in  [—1, 1]  and  zero  otherwise.  This  is  in  and  L  and 
the  Fourier  transform  using  either  definition  is  is  A(w)  =  2sinti;/a;.  This  answer  is  in  L  but  not  in  L  . 
The  inverse  JD^-FT  of  X{u))  is  the  original  x{t). 

3.  If  a;(f)  e  and  A'(a;)  €  then  the  Lebesgue  integral  A(a;)e-'‘^‘dcj/27r  exists,  and  equals  x(t)  a.e. 

4.  Parseval’s  relation  holds,  i.e.,  \/2ff||a;(t)||2  =  ||A(a;)||2.  Thus  the  FT  is  a  linear  transformation  from 

to  which  preserves  norms  except  the  scale  factor  \/^.  (Note  that  this  would  not  make  sense  if 
x(t)  were  only  in  L\)  In  particular  it  is  a  bounded  transformation  because  the  norm  ||A(a;)||2  in  the 
transform  domain  is  bounded  by  the  norm  ||a;(t)||2  in  the  original  domain. 

5.  Unlike  the  FT,  the  FT  X(cj)  need  not  be  continuous.  For  example,  the  impulse  response  of  an 
ideal  lowpass  filter  (sine  function)  is  in  and  its  Fourier  transform  is  not  continuous. 

6.  Let  be  a  sequence  in  and  let  x(t)  =  c„/„(f)  be  a  convergent  summation  (in  the 

sense).  With  upper  case  letters  denoting  the  L^-FTs,  we  then  have  A(w)  =  CnF„(u^).  This  result  is 

obvious  for  finite  summations  because  of  linearity  of  the  FT.  For  infinite  summations  this  follows  from 
the  property  that  the  i^-FT  is  a  continuous  mapping  from  to  L^.  (This  in  turn  follows  from  the 
result  that  it  is  a  bounded  linear  transformation  [Naylor  and  Sell,  1982]).  The  continuity  allows  us  to 
move  the  FT  operation  inside  the  infinite  summation. 

Thus  there  is  complete  symmetry  between  the  time  and  frequency  domains.  The  L^-FT  is  a  one-to-one 
mapping  from  onto  L^.  Moreover  since  \/2^||ar(t)||2  =  ||A(w)(|2,  it  is  a  norm  preserving  mapping  -  one 
says  that  the  L^-FT  is  an  isometry  from  to  L^. 

The  Fourier  Transform. 

If  a  sequence  x(n)  e  then  its  discrete-time  FT  A(e^“)  =  En  a:(n)e-^'“"  exists,  and  is  the  FT  of 
x(n).  It  can  be  shown  that  A'(e^“)  is  a  continuous  function  of  w  and  that  |A(e-'“)|  is  bounded. 

6.4.  Convolutions 

Suppose  h(t)  e  and  x(t)  €  LP  for  some  p  in  1  <  p  <  oo.  Then  the  familar  convolution  integral  defined 
by  {x  *  h)(t)  =  fx(T)h(t  -  rjclr  exists  for  almost  all  t  (Rudin,  1966].  If  we  define  a  function  p(t)  to  be 
X  *  h  where  it  exists  and  to  be  zero  elsewhere,  the  result  is  in  fact  an  LP  function.  We  simply  say  that  the 
convolution  of  an  function  with  an  Lp  function  gives  an  LP  function.  By  recalling  that  an  LTI  system  is 
stable  (i.e.,  BIBO  stable,  Sec.  1),  if  and  only  its  impulse  response  is  in  we  therefore  have  the  following 
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examples: 

1.  If  an  signal  is  input  to  a  stable  LTI  system,  the  output  is  in  X'.  Since  the  convolution  of  two 
signals  is  in  the  cascade  of  two  stable  LTI  systems  is  stable,  a  readily  accepted  fact  in  engineering. 

2.  If  an  I?  signal  (finite  energy  input)  is  input  to  a  stable  LTI  system,  the  output  is  in 

3.  If  an  signal  is  input  to  a  stable  LTI  system,  the  output  is  in  L~  (i.e.,  bounded  inputs  produce 
bounded  outputs). 

If  x{i)  and  h{t)  are  both  in  X^,  their  convolution  y{i)  is  in  X\  and  all  three  signals  have  X^  Fourier  transform. 
The  convolution  theorem  [Rudin,  1966]  says  that  these  three  are  related  as  Y{lo)  =  H{uj)X{u).  When  signals 
are  not  necessarily  in  X^  we  cannot  in  general  write  this,  even  if  convolution  might  itself  be  well  defined! 

Convolution  Theorems  for  X^  Signals 

For  all  our  discussions  in  the  preceding  sections,  the  signals  have  been  restricted  to  be  in  X^  but  not 
necessarily  in  X^.  In  fact,  even  the  filters  are  often  only  in  X^.  For  example,  ideal  bandpass  filters  (Fig  2.8) 
are  unstable,  and  therefore  only  in  X^.  For  arbitary  X^  signals  x{t)  and  h{t),  the  convolution  theorem  does 
not  hold.  We  therefore  need  to  understand  X^  convolution  more  carefully. 

Assume  that  x{t)  and  h{t)  are  both  in  X^.  Their  convolution  y(t)  =  J  x(T)h(t  —  r)dT  exists  for  all 
t,  since  the  integral  is  just  an  inner  product  in  X^.  Using  Schwartz  inequality  [Rudin,  1966],  we  also  have 
|s/(t)|  <  ||a:(t)||2||h(t)||2,  that  is  y(t)  €  X°®'  Suppose  the  filter  h(t)  has  the  further  property  that  the  frequency 
response  ff(uj)  is  bounded,  that  is,  lif(w)|  <  B  a.e.,  for  some  B  <  oo.  Then  we  can  show  that  y(t)  €  X^ 
and  that  the  convolution  theorem  holds,  that  is  y(u))  =  H{w)X{w).  To  prove  this,  note  that 

y{t)  =  J  x{T)h{t  —  T)dT  = -^  J  X{uj)H{u))e^'^^du;  (6-11) 

from  Parseval’s  relation  which  holds  for  X^  signals  [Rudin,  1966].  If  [Xf(w)|  <  B,  then  \X{uj)H{u;)'^  < 
B^\X{u})\^.  So  \X{uj)H{ij)\^  is  bounded  by  the  integrable  function  |A'(w)|^,  and  is  therefore  integrable  (Sec. 
6.1).  Thus  X{lo)H{ijj)  £  X^,  and  the  preceding  equation  establishes  that  y{t)  £  .  The  equation  also  shows 

that  y{t)  and  H{u))X{lu)  form  an  X^  FT  pair,  so  Y{u)  =  H{(xi)X{uj)  indeed. 

Bounded  X^  filters.  Filters  for  which  h(t)  £  and  H{uj)  bounded  will  be  called  bounded  filters. 
The  preceding  discussion  shows  that  bounded  X^  filters  admit  the  convolution  theorem  though  arbitrary  X^ 
filters  do  not.  Another  advantage  of  bounded  X^  filters  is  that  a  cascade  of  two  bounded  X^  filters  hi  (t)  and 
h2(t)  is  a  bounded  X^  filter,  just  as  a  cascade  of  two  stable  filters  would  be  stable.  To  see  this  note  that 
the  cascaded  impulse  response  is  the  convolution  h{t)  —  {hi  */i2)(f).  By  the  preceding  discussion,  h{t)  £  X^, 
and  moreover  H{lj)  =  (w)Xr2(w).  Clearly  H{w)  is  still  bounded.  Bounded  X^  filters  are  therefore  very 
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convenient  to  work  with.  Fortunately,  all  filters  in  the  discussion  of  wavelets  and  filter  banks  are  bounded 
filters,  even  though  they  may  not  be  BIBO  stable  (like  the  ideal  bandpass  filters  in  Fig.  2.8).  We  summarize 
the  preceding  discussions  as  follows: 

Theorem  6.3.  Convolution  of  functions.  We  say  that  h{t)  is  a  bounded  filter  if  h{t)  £ 
and  |H(w)|  <  B  <  OD  a.e. 

1.  Let  x{t)  e  L^,  and  let  h{t)  be  a  bounded  filter.  Then  y{t)  =  {x  *  h){t)  exists  for  all  t  and  y{t)  €  L^. 
Moreover  Y(uj)  = 

2.  If  hi{t)  and  h2{t)  are  bounded  filters,  then  their  cascade  h{t)  =  {hi  *  h2){t)  is  a  bounded  filter, 

and  H{uj)  =  Hi{ij)H2ioj).  ^ 

7.  RIESZ  BASIS.  BIORTHOGONALITY.  AND  OTHER  FINE  POINTS 

In  a  finite  dimensional  space  such  as  the  space  of  all  ^component  Euclidean  vectors,  the  ideas  of  basis  and 
orthonormal  basis  are  easy  to  appreciate.  When  we  extend  these  ideas  to  infinite  dimensional  spaces  (i.e., 
where  the  basis  {5n(f)}  has  infinite  number  of  functions),  a  number  of  complications  and  subtleties  arise. 
Our  aim  is  to  point  these  out  here.  References  for  this  section  include  [Riesz  and  Nagy,  1955],  and  [Haaser 
and  Sullivan,  1971],  [Young,  1980],  [Chui,  1992a],  and  [Daubechies,  1992]. 

Readers  familiar  with  Hilbert  spaces  will  note  that  the  space  is  a  Hilbert  space;  all  our  developments 
here  are  valid  for  any  Hilbert  space  W.  Elements  in  Ti.  (vectors)  are  typically  denoted  x,  y  and  so  forth. 
When  we  deal  with  the  Hilbert  space  L^,  the  vectors  are  functions  and  are  denoted  as  x{t),  y{t)  and  so  forth 
for  clarity.  Similarly  for  the  special  case  of  Euclidean  vectors  we  use  bold  face,  e.g.,  x,  y  and  so  forth.  The 
reader  not  familiar  with  Hilbert  spaces  can  assume  that  all  discussions  are  in  and  that  x  is  merely  a 
simplification  of  the  notation  x{t). 

7.1.  Finite  dimensional  vector  spaces 

We  will  first  look  at  the  finite  dimensional  c^lse  and  then  proceed  to  the  infinite  dimensional  case.  Consider 
an  N  X  N  matrix  F  =  [  fj  f2  ...  fjv  ]  •  We  assume  that  this  is  nonsingular,  that  is,  the  columns  f„ 
are  linearly  independent.  These  column  vectors  form  a  basis  for  the  TV-dimensional  Euclidean  space  of 
complex  iV-component  vectors.  This  space  is  an  example  of  a  finite  dimensional  Hilbert  space,  with  inner 
product  defined  as  (x,y)  =  y'f’x  =  a:„y*.  The  norm  ||x||  induced  by  this  inner  product  is  defined  as 

||x||  =  ^(x, x).  Thus  ||x|p  =  xtx  =  Y^^^i 

Any  vector  x  S  can  be  expressed  as  x  =  for  some  uniquely  determined  set  of  scalars  c„. 

We  can  abbreviate  this  as  x  =  Fc  where  c  =  [  ci  C2  ...  ]^  .  The  matrix  F  can  be  regarded  as  a  linear 
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transformation  from  to  The  nonsingularity  of  F  means  that  for  every  x  €  we  can  find  a  unique 
c  such  that  x  =  Fc. 

Boundedness  of  F  and  its  Inverse. 

In  practice  we  have  a  further  requirement,  namely  that  if  the  norm  ||c||  is  “small”  then  ||x||  should 
also  be  “small”,  and  vice  versa.  This  requirement  implies,  for  example,  that  if  there  is  a  small  error  in  the 
transmission  or  estimate  of  the  vector  c  then  the  corresponding  error  in  x  is  also  small.  From  the  relation 
X  =  Fc  we  obtain 

||x||^  =  x^x  =  c^F^Fc  (7-1) 

Letting  Xm  and  Xm  denote  the  maximum  and  minimum  eigenvalues  of  F^F  it  then  follows  that  ||x|p  > 
Am||c|p  and  that  ||x|p  <  Am||c|P.  That  is, 

A,„||c|P<||x|P<Am||c1P  (7.2) 

with  0  <  Am  <  Xm  <  where  0  <  Am  follows  from  nonsingularity  of  F .  Thus  the  transformation  F  which 
converts  c  into  x  has  an  amplification  factor  bounded  by  Xm  in  the  sense  that  ||x|p  <  A;v/||c|p.  Similarly 
the  inverse  transformation  G  =  F~^  which  converts  x  into  c  has  amplification  bounded  by  l/Am-  Since 
Xm  is  finite,  we  say  that  F  is  a  bounded  linear  transformation.  And  since  Am  #  0  we  see  that  the  inverse 
transformation  is  also  bounded. 

Using  X  =  c„f„  and  ||c|p  =  Y,„  knP  we  can  rewrite  the  preceding  inequality  as 

(7.3) 

n  n  n 

where  A  =  Am  >  0  and  B  =  Am  <  oo,  and  all  summations  are  for  1  <  n  <  N.  Readers  familiar  with  the  idea 
of  a  Riesz  basis  in  infinite  dimensional  Hilbert  spaces  will  notice  that  the  above  is  in  the  form  that  agrees 
with  that  definition.  We  will  return  to  this  later. 

Biorthogonality 

With  F”^  denoted  as  G,  let  gl  denote  the  rows  of  G,  that  is 

F  =  [fi  f2  ...  fv]  (7.4) 

The  property  GF  =  I  implies  g^f,,  =  6{k  -  n),  that  is 

(f77,gfc)  =  '5(fc-n)  (7.5) 


gi 

gj 

Jt 
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for  1  <  <  N.  Equivalently,  (gfc,fn)  =  -  n). 

Two  sets  of  vectors  {f„}  and  {g/t}  satisfying  (7.5)  are  said  to  be  biorthogonal.  Since  c  =  F  =  Gx 
we  can  write  the  elements  of  c  aa  c„  =  glx  =  (x,g„).  Then  x  =  5Z„c„fn  =  Since  is  a 

nonsingular  matrix,  we  can  use  its  columns  g„  (instead  of  the  columns  of  F)  to  obtain  a  similar  development, 
and  express  the  arbitrary  vector  x  €  as  x  =  Thus 

X  =  ^(x,g„)f„  =  ^(x,f„)g„,  (7-6) 

n  n 

where  the  summations  are  for  1  <  n  <  iV.  By  using  the  expressions  c„  =  (x,  g„)  and  x  =  c„fn  we  can 
rearrange  the  inequality  (7.3)  into  B-*||x|p  <  |{x,gn)P  <  ^"M|x|P-  With  the  columns  g„  of  Gt  (rather 
than  the  columns  of  F)  used  as  the  basis  for  we  obtain  similarly 

T||xlP<^|{x,f„)|'<B||x|p  (7.7) 

n 

where  1  <  n  <  N,  and  A  =  Am,  B  =■  \m  again.  Readers  familiar  with  the  idea  of  a  frame  in  an  infinite 
dimensional  Hilbert  space  will  recognize  that  the  above  inequality  defines  a  frame  {f„}.  We  will  return  to 
this  in  Sec.  8. 

Orthonormality. 

The  basis  f„  is  said  to  be  orthonormal  if  {tkAn)  =  ^{k-n)  i.e.,  flft  =  6{k-n).  Equivalently  F  is  unitary, 
that  is  FtF  =  I.  In  this  case  the  rows  of  the  inverse  matrix  G  are  the  quantities  f^.  Since  FtF  =  I  we  have 
Am  =  Am  =  1,  that  is,  A  =  H  =  1.  With  this,  (7.2)  becomes  ||c||  =  ||x||,  that  is,  |c„p  =  ||  c„f„  |p. 
Thus  (7.3)  is  a  generalization  of  the  orthonormal  situation.  Similarly  biorthogonality  (7.5)  is  a  generalization 
of  orthonormality. 

7.2.  Basis  in  infinite  dimensional  spaces 

When  the  simple  idea  of  a  basis  in  a  finite  dimensional  space  (e.g.,  the  Euclidean  space  C^)  is  extended 
to  infinite  dimensions,  several  new  issues  arise  which  make  the  problem  nontrivial.  Thus  consider  the 
sequence  of  functions  {/„},!  <  n  <  oo  in  a  Hilbert  space  H.  Because  of  the  infinite  range  of  n  we  now 
have  to  consider  linear  combinations  of  the  form  ^n/n-  The  problem  that  immediately  arises  is  one  of 

convergence.  For  arbitrary  sequences  c„  this  sum  does  not  converge,  so  we  have  to  replace  the  statement 
“all  linear  combinations”  with  something  else.^ 

^  In  our  review  we  will  use  1  <  n  <  oo  to  be  consistent  with  standard  math  texts,  but  all  the  crucial 
results  hold  for  doubly  infinite  sequences  and  summations,  i.e.,  for  the  case  -oo  <  n  <  oo.  This  is  what  we 
need  in  the  case  of  Fourier  and  wavelet  bases,  see  for  example  Eq.  (2.3)  and  (2.4). 
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I 

I 

I 

I  Closure  of  span.  Let  us  first  define  the  set  of  cill  finite  linear  combinations  of  the  form  Cnfm 

I  where  N  varies  over  all  integers  >  1.  This  is  called  the  span  of  {/„}.  Now  suppose  a:  €  is  a  vector  not 

1  necessarily  in  the  span  of  {/„}  but  can  be  approximated  as  closely  as  we  wish,  by  vectors  in  the  span.  In 

other  words  given  an  e  >  0  we  can  find  N  and  the  sequence  of  constants  c„jv  such  that 

N 

||a;  -  ^  CnNfnW  < 

n=l 

where  ||a;||  is  the  norm  defined  as  ||x||  =  (x,  x).  If  we  append  all  such  vectors  x  to  the  span  of  {/n}  we  get 

the  closure  of  the  span  of  {/„}.^  Note  that  CnN  in  general  depends  on  e  since  N  depends  on  e. 

Completeness.  We  say  that  a  sequence  of  vectors  {/n}  is  complete  in  7i  if  the  closure  of  the  linear  span 
of  {/„}  equals  H.  Thus  any  x  €  W  can  be  approximated,  as  closely  as  we  wish,  by  finite  linear  combinations 
of  f„  in  the  sense  (7.8).  This  is  also  expressed  by  saying  that  the  linear  span  of  {/„}  is  dense  in  H.  It 

'  turns  out  that  completeness  of  {/n}  in  a  Hilbert  space  is  equivalent  to  the  statement  that  the  only  vector 

I 

I 

I  orthogonal  to  all  f„  is  the  zero  vector. 

I  Infinite  summations.  When  we  write  x  =  ^nfn  we  mean  that  the  infinite  summation  converges 

[  to  X  in  the  norm  of  H.  In  other  words,  given  e  >  0  there  exists  no  such  that 

t  N 

\  llx- j;c„/„|l  <€  foralliV>no.  (7.9) 

[  n=l 

1 

This  statement  is  stronger  than  saying  that  x  is  in  the  closure  of  the  linear  span  of  {/„}.  The  latter  statement 
j  only  requires  (7.8),  where  N,  and  hence  c„jv,  depends  on  e.  In  the  former  statement  (7.9),  {cn}  is  a  fixed 

!  sequence. 

Linear  independence. 

Let  {fn},n  =  1,2,  ...  be  a  sequence  of  vectors  in  an  infinite  dimensional  Hilbert  space  H.  Unlike  in  a 
finite  dimensional  space,  one  has  to  distinguish  between  several  types  of  linear  independence. 

Type  1:  {fn}  has  finite  linear  independence  if  c„fn  —  0  for  any  finite  N  implies  =  0, 1  <  n  <  A. 
Type  2:  {/„}  is  w-independent  if  ^n/rz  =  0  implies  c„  =  0  for  all  n  (where  the  infinite  sum  is 

interpreted  as  explained  above). 

Type  3.  {fn}  is  minimal  if  none  of  the  fm  is  in  the  closure  of  the  span  of  the  remaining  set  of  /„. 

Type  3  independence  implies  Type  2,  which  in  turn  implies  Type  1.  Thus,  Type  3  is  the  strongest  kind  of 
linear  independence.  The  reason  why  it  is  stronger  than  Type  2  is  this:  Type  2  implies  that  we  cannot  have 

i  The  term  “closure”  has  its  origin  from  the  theory  of  metric  spaces,  more  generally  topological  vector 
spaces.  We  will  not  require  the  deeper,  more  general  meaning  here. 
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fm  Cnfn-  Howcver,  for  a  Type  2  independent  sequence  {fn}i  it  i®  possible  that  we  can  make 

\\fm  -  ^  c„jv/nll  <  ^  (’i'-lO) 

n=l 

n^m 

for  any  given  e  >  0  by  choosing  N  and  c„jv  properly.^  Type  3  linear  independence  prohibits  even  this. 
Example  7.2  will  make  this  distinction  clearer. 

Basis  or  Shcauder  basis.  A  sequence  of  vectors  {/n}  in  7f  is  a  Schauder  basis  for  7i  if  (i)  any  x  gH, 
can  be  expressed  as  x  =  ^^=1  and  (b)  the  sequence  of  scalars  {c„}  is  unique  for  a  given  x.  The  second 
condition  can  be  replaced  with  the  statement  that  {/„}  is  w-independent.  A  subtle  result  for  Hilbert  spaces 
[Young,  1980]  is  that  a  Schauder  basis  automatically  satisfies  minimality  (i.e.,  Type  3  independence). 

A  Schauder  basis  is  w-independent  and  complete  in  the  sense  defined  above.  Conversely,  ^-independence 
and  completeness  do  not  imply  that  {/„}  is  a  Schauder  basis;  completeness  only  means  that  we  can  ap¬ 
proximate  any  vector  as  closely  as  we  wish  in  the  sense  of  (7.8)  where  CkN  depend  on  N.  In  this  chapter 
“independence”  (or  linear  independence)  stands  for  ^-independence.  Similarly  “basis”  stands  for  Schauder 
basis  unless  qualified  otherwise. 

7.3.  Riesz  Basis 

Any  basis  {fn}  in  a  finite  dimensional  space  satisfies  (7.3),  which  in  turn  ensures  that  the  transformation 
from  X  to  {cn}  and  that  from  {c„}  to  x  are  stable.  For  a  basis  in  an  infinite  dimensional  space,  Eq.  (7.3)  is 
not  automatically  guaranteed,  as  shown  by  the  following  example. 

Example  7.1.  Let  {e„},  1  <  n  <  oc  be  an  orthonormal  basis  in  a  Hilbert  space  H  and  define  the 
sequence  {/„}  by  f„  =  e„/n.  Then  we  can  show  that  /„  is  still  a  basis,  i.e.,  it  satisfies  the  definition  of  a 
Schauder  basis.  Suppose  we  pick  x  =  eck  for  some  k.  Then  x  =  Y,n  ^nfn  with  Ck  =  ek,  and  c„  =  0  for  all 
other  n.  Thus  |cnp  =  and  grows  as  k  increases,  though  ||x||  =  e  for  all  k.  That  is,  a  “small  error  in 
X  can  get  amplified  in  an  unbounded  manner.  Recall  that  this  could  never  happen  in  the  finite  dimensional 
case  (Sec.  7.1  )  because  A  >  0  in  Eq.  (7.3).  For  our  basis  {/„},  we  can  indeed  show  that  there  is  no  .4  >  0 
satisfying  (7.3)!  To  see  this  let  c„  =  0  for  all  n  except  that  cjt  =  1.  Then  =  fk  =  ek/k  and  has 

norm  1/k.  So  (7.3)  reads  A  <  1/k^  <  B  for  all  k  >  1.  This  is  not  possible  with  A  >  0.  <> 

If  {e„},  1  <  n  <  oo  is  an  orthonormal  basis  in  an  infinite  dimensional  Hilbert  space  W,  then  any  vector 

§  As  we  make  e  smaller  and  smaller,  we  ma}^  need  to  change  N  and  all  coefficients  CkN-  Therefore,  this 
does  not  imply  fm  =  IZn^m 
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X  can  be  expressed  uniquely  as  a;  =  where 

OO 

n=:l 

This  property  automatically  ensures  the  stability  of  the  transformations  from  x  to  {cn}  and  vice  versa.  The 
Riesz  basis  is  defined  such  that  this  property  is  made  more  general,  t 

Definition  of  a  Riesz  basis.  A  sequence  {/„},  1  <  n  <  oo  in  a  Hilbert  space  W  is  a  Riesz  basis  if  it 
is  complete  (Sec.  7.2.)  and  there  exist  constants  A  and  B  such  that  0  <  A  <  R  <  oo  and 

OO  OO  2 

''El'”!' s||E <=./"!  s-bEI'-i' 

n=l  n=l  r!=l 

for  all  choice  of  c„  satisfying  knP  <  oo-  ^ 

In  a  finite  dimensional  Hilbert  space,  A  and  B  come  from  the  extreme  eigenvalues  of  a  nonsingular 
matrix  F^F,  so  A  >  0  and  B  <  oo  automatically  (Sec.  7.1).  That  is,  any  basis  in  a  finite  dimensional  space 
is  a  Riesz  basis.  As  Example  7.1  shows,  this  may  not  be  the  case  in  infinite  dimensions. 

Unconditional  basis.  It  can  be  shown  that  a  Riesz  basis  is  an  unconditional  basis,  that  is,  any 
reordering  of  {/„}  is  also  a  basis  (and  the  new  c„  are  the  correspondingly  reordered  versions).  This  is  a 
nontrivial  statement;  an  arbitrary  (Schauder)  basis  is  not  necessarily  unconditional;  in  fact  the  space  of  L 
functions  (which  is  a  Banach  space,  not  a  Hilbert  space)  does  not  have  an  unconditional  basis. 

Role  of  the  Constants  A  and  B 

1.  Strongest  linear-independence.  The  condition  A  >  0  means,  in  particular,  that  c„/„  ^  0  unless  c„ 

is  zero  for  all  n.  This  is  just  ^-independence.  Actually  the  condition  A  >  0  means  that  the  vectors  {/„} 
are  independent  in  the  strongest  sense  (Type  3),  that  is,  {/„}  is  minimal.  To  see  this  assume  this  is 
not  the  case.  That  is,  suppose  some  vector  fm  is  in  the  closure  of  the  span  of  the  others.  Then,  given 
arbitrary  e  >  0  we  can  find  N  and  CnN  satisfying  (7.8)  with  x  =  fm-  Defining  Cn  =  —CnN  for  n  ^  m  and 
Cm  =  1,  we  see  that  (7.11)  implies  A(1  -b  l^n/vP)  <  Since  e  is  arbitrary,  this  is  not  possible  for 

A>0. 

2.  Distance  between  vectors.  The  condition  A  >  0  also  implies  that  no  two  vectors  in  {/«}  can  get 
“arbitrarily  close”.  To  see  this,  choose  ct  =  —Cm  =  1  for  some  k.m  and  c„  =  0  for  all  other  n.  Then 

t  For  readers  familiar  with  bounded  linear  transformations  in  Hilbert  spaces,  we  state  that  a  basis  is  a 
Riesz  basis  if  and  only  if  it  is  related  to  an  orthonormal  basis  via  a  bounded  linear  transformation  with 
bounded  inverse. 
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(7.11)  gives  2A  <  ||A-  --  /m|P  <  2.B.  That  is,  the  distance  between  any  two  vectors  is  at  least  y/2A,  at 
most  y/2B. 

3.  Bounded  basis.  A  Riesz  basis  is  a  bounded  basis  in  the  sense  that  ||/„||  cannot  get  arbitrarily  large.  In 
fact,  by  choosing  c„  =  0  for  all  but  one  value  of  n,  we  can  see  that  0  <  A  <  ||/n|P  <  B  <  co.  That 
is,  the  norms  of  the  vectors  in  the  basis  cannot  get  arbitrarily  small  or  large.  Note  that  the  basis  in 
Example  7.1  violates  this,  since  ||/„||  =  1/ra.  Therefore,  Example  7.1  is  only  a  Schauder  basis  and  not 
a  Riesz  basis. 

4.  Stability  of  Basis.  The  condition  A  >  0  yields  ^  where  x  =  En  ^n/n-  This  means 

that  the  transformation  from  the  vector  x  to  the  sequence  {c„},  is  bounded;  so  a  small  error  in  x  is  not 
amplified  in  an  unbounded  manner.  Similarly  the  inequality  ||a;|p  <  R  |cnP  shows  that  the  role  of  B 
is  to  ensure  that  the  inverse  transformation  from  c„  to  x  is  bounded.  Summarizing,  the  transformation 
from  X  to  {c„}  is  numerically  stable  (i.e.,  small  errors  not  severly  amplified)  because  A  >  0  and  the 
reconstruction  of  x  from  {cn}  is  numerically  stable  because  B  <  oo. 

5.  Orthonormality.  For  a  Riesz  basis  with  A  =  R  =  1  the  condition  (7.11)  reduces  to  X^„lc„|  — 

shown  that  such  a  Riesz  basis  is  just  an  orthonormal  basis.  The  properties 
listed  above  show  that  the  Riesz  basis  is  as  good  as  an  orthonormal  basis  in  most  applications.  It  can 
be  shown  that  any  Riesz  basis  can  be  obtained  from  an  orthonormal  basis  by  means  of  a  bounded  linear 
transformation  with  bounded  linear  inverse. 

Example  7.2.  Mishaps  with  system  which  is  not  a  Riesz  basis.  Let  us  modify  Example  7.1  to 
/„  =  (e„/7i)  +  Cl  ,  71  >  1,  where  {e„}  is  an  orthonormal  basis.  It  turns  out  that  as  ti  -+  oo  the  vectors  /„  get 
arbitrarily  closer  together  (though  ||/„||  approaches  unity  from  above).  Formally  fn-fm  =  (cn/n)  -  (cm/Tn), 
so  ||/„  —  /m|P  =  (1/^^)  +  which  goes  to  zero  as  n,m  —*  oo.  Thus  there  is  no  A  >  0  satisfying  (7.11) 

(because  of  comment  2  above).  This,  then,  is  not  a  Riesz  basis  (in  fact  this  is  not  even  a  Schauder  basis,  see 
below).  This  example  also  hcis  R  =  oo.  To  see  this  let  c„  =  I/tz,  then  |c„p  converges  but  ||  53n=i 
does  not  converge  as  iV  ^  oo  (as  we  can  verify),  so  (7.11)  is  not  satisfied  for  finite  R.  Such  mishaps  cannot 

occur  with  a  Riesz  basis.  ^ 

In  this  example  {/„}  is  not  minimal  (which  is  Type  3  indenendence).  To  see  this  note  that  ||/i  -  /„|1 
gets  arbitrarily  small  as  n  increases  to  infinity.  So  /i  is  in  the  closure  of  the  span  of  {/«},«  ^  1.  However 
{/„}  is  ^-independent;  there  is  no  sequence  {c„}  such  that  H  X)!Li  ^nfn\\  0  as  N  ^  oo.  In  any  case, 
the  fact  that  {/„}  is  not  minimal  (i.e.,  not  independent  in  the  strongest  sense)  shows  that  it  is  not  even  a 

Schauder  basis  (see  Sec.  7.2). 
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7.4.  Biorthogonal  Systems,  Riesz  Bases  and  Inner  Products 

When  discussing  finite  dimensional  Hilbert  spaces  (Sec.  7.1)  we  found  that  given  a  basis  f„  (columns  of  a 
nonsingular  matrix)  we  can  express  any  vector  x  as  a  linear  combination  x  =  ^n{^,gn)^n  where  g„  is  such 
that  the  biorthogonality  property  (fm,  gn)  =  ^("»  -  «)  holds.  A  similar  result  is  true  for  infinite  dimensional 

Hilbert  spaces. 

Theorem  7.1.  Biorthogonality  and  Riesz  basis.  Let  {/n}  be  a  basis  in  a  Hilbert  space  'H.  Then 
there  is  a  unique  sequence  {5„}  biorthogonal  to  {/n},  that  is, 

{fm,gn)  =  -  n)  (biorthogonality).  (7.12) 

Moreover  the  unique  expansion  of  any  x  £  Ti.  in  terms  of  the  basis  {fn}  is  given  by 

OO 

X  ='^{x,gn)fn- 

n=l 

It  is  also  true  that  the  biorthogonal  sequence  {ffn}  is  a  basis  and  that  x  =  ^n=i{x,  frt)gn-  Moreover  if  {fn} 
is  a  Riesz  basis,  then  Y,n  and  En  are  finite,  and  we  have 

OO 

A\\xr<^\{x,fn)\'^<B\\xf  (7.14) 

n=l 

where  A  and  B  are  the  same  constants  as  in  the  definition  (7.11)  of  a  Riesz  basis.  <0 

This  beautiful  result  resembles  the  finite  dimensional  version  (Sec.  7.1)  where  /„  corresponds  to  the 
column  of  a  matrix  and  g„  corresponds  to  the  rows  (conjugated)  of  the  inverse  matrix.  In  this  sense  we  can 
regard  the  biorthogonal  pair  of  sequences  {/n},  {^n}  as  inverses  of  each  other.  Both  of  these  are  bases  for 
'H.  A  proof  of  the  above  result  can  be  obtained  by  combining  the  ideas  on  p.  28-32  of  [Young,  1980].  The 
theorem  implies,  in  particular,  that  if  {/„}  is  a  Riesz  basis,  then  any  vector  in  the  space  can  be  written  in 
the  form  X)^i  ^nfn,  where  c„  € 

Summary  on  Riesz  basis.  The  Riesz  basis  {/„}  in  a  Hilbert  space  H  was  defined  in  Sec.  7.3.  The  set 
{/„}  is  a  complete  set  of  vectors,  linearly  independent  in  the  strongest  sense  (i.e..  Type  3  or  minimal.)  It  is  a 
bounded  basis  with  bounded  inverse.  Any  two  vectors  are  separated  by  at  least  a/^,  that  is  ||/„-/m|P  >  2A. 
The  norm  of  each  basis  vector  is  bounded  as  |j/„||  <  VB.  In  the  expression  x  =  c„fn  the  computation 
of  X  from  c„  as  well  as  the  computation  of  c„  from  x  are  numerically  stable,  because  B  <  oo  and  A  >  0 
respectively.  A  Riesz  basis  with  A  =  H  =  1  is  an  orthonormal  basis.  In  fact,  any  Riesz  basis  can  be  obtained 
from  an  orthonormal  basis,  via  a  bounded  linear  transformation  with  a  bounded  inverse.  Given  any  basis 
{/„}  in  a  Hilbert  space,  there  exists  a  unique  biorthogonal  sequence  such  that  we  can  express  any  x  €  H 
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as  a;  =  Qn) fn  as  well  as  a;  =  ^^=1  /")5n;  if  fi^i®  ^^i®  i®  ^®°  ^  \{^^fn)? 

and  X)n  \{^^9n)?  are  finite.  If  {/„}  is  a  Riesz  basis,  then  any  vector  x  €  H  can  be  written  in  the  form 
X  =  X)^i  Cn/n,  where  c„  e 

8.  FRAMES  IN  HILBERT  SPACES 

A  frame  in  a  Hilbert  space  W  is  a  sequence  of  vectors  {/„}  with  certain  special  properties.  While  a  frame  is 
not  necessarily  a  basis,  it  shares  some  properties  of  a  basis.  For  example  we  can  express  any  vector  x  eH  as 
a  linear  combination  of  the  frame  elements,  i.e.,  x  =  c„/„.  But  frames  in  general  have  redundancy,  that 

is  the  frame  vectors  are  not  necessarily  linearly  independent,  even  in  the  weakest  sense  defined  in  Sec.  7.2. 
We  will  see  that  the  Riesz  basis  (hence  any  orthonormal  basis)  is  a  special  case  of  frames.  The  concept  of  a 
frame  is  useful  when  discussing  the  relation  between  wavelets,  short  time  Fourier  transforms  and  filter  banks. 
The  idea  of  frames  was  introduced  by  Duffin  and  Schaeffer  [1952],  and  used  in  the  context  of  wavelets  and 
STFT  by  Daubechies  [1992].  Excellent  tutorials  can  be  found  in  Young  [1980]  and  Heil  and  Walnut  [1989]. 

Definition  of  a  frame,  A  sequence  of  vectors  {/„}  in  a  (possibly  infinite  dimensional)  Hilbert  space 
W  is  a  frame  if  there  exist  constants  A  and  B  with  0  <  A  <  H  <  oo  such  that  for  any  x  €  W  we  have 

n=l 

The  constants  A  and  B  are  called  frame  bounds.  'O' 

In  Sec.  7.4  we  saw  that  a  Riesz  basis  which  by  definition  satisfies  (7.11),  also  satisfies  (7.14)  which  is 
precisely  the  frame  definition!  A  Riesz  basis  is,  therefore,  also  a  frame.  But  it  is  a  special  case  of  a  frame 
where  the  set  of  vectors  is  minimal  (see  below). 

Any  frame  is  complete.  That  is,  if  a  vector  x  E  H  vs  orthogonal  to  all  elements  in  {/„}  then  x  =  0 
(otherwise  A  >  0  is  violated).  Thus,  any  x  €  W  is  in  the  closure  of  the  span  of  the  frame.  In  fact,  we 
will  see  that  more  is  true,  namely  we  can  express  x  =  c„/„,  though  {c„}  may  not  be  unique.  The  frame 

elements  are  not  necessarily  linearly  independent,  as  demonstrated  by  examples  below.  A  frame,  then,  is 
not  necessarily  a  basis.  Compare  (8.1)  with  the  Riesz  basis  definition  (7.11),  where  the  left  inequality  forced 
the  vectors  to  be  linearly  independent  (in  fact  minimal).  The  left  inequality  for  a  frame  only  ensures 
completeness,  not  linear  independence. 

8.1.  Representing  Arbitrary  Vectors  in  Terms  of  Frame  Elements 

We  will  see  later  that,  given  a  frame  {/„}  we  can  associate  with  it  another  sequence  {5r„}  called  the  dual 
frame,  such  that  any  element  x  eH  can  be  represented  as  x  =  turns  out  that  we  can  also 
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write  X  =  fn)9n-  This  representation  in  terms  of  {/„}  and  {gn}  resembles  the  biorthogonal  system 

discussed  in  Sec.  7.4,  but  we  will  point  out  some  differences  later. 

Stability  of  computations.  To  obtain  the  representation  x  =  fn)gn  we  compute  (at  least 

conceptually)  the  coefficients  (a:,  /„)  for  all  n.  This  computation  is  a  linear  transformation  from  T-L  to  the  space 
of  sequences.  The  inverse  transform  computes  x  from  this  sequence  by  using  the  formula  x  =  {x,  fn)gn- 

The  condition  15  <  oo  in  the  frame  definition  ensures  that  the  transformation  from  x  to  is  bounded. 

Similarly  the  condition  A  >  0  ensures  that  the  inverse  transformation  from  {x,  fn)  to  x  is  bounded.  Thus 
the  conditions  A  >  0  and  B  <  oo  ensure  stability;  small  errors  in  one  domain  are  not  arbitrarily  amplified  in 
the  other  domain.  A  similar  advantage  was  pointed  out  in  Sec.  7.3  for  the  Riesz  basis  -  for  arbitrary  bases 
in  infinite  dimensional  spaces  such  an  advantage  cannot  be  claimed  (Example  8.11). 

Instead  of  x  =  /n)5n  if  we  wish  to  use  the  dual  representation  x  =  then  we 

would  have  to  compute  (x,^„)  and  so  forth;  then  the  roles  of  A  and  B  are  taken  up  by  1/5  and  1/A 
respectively,  and  similar  discussions  hold.  This  is  summarized  in  Fig.  8.1. 


Stable  because  B<  <o 

Fig.  8.1.  Representation  of  x  using  frame  elements  {fn}- 
The  transformation  from  x  to  {cn}  and  vice  versa  are  stable. 

8.2.  Exact  Frames,  Tight  Frames,  Riesz  Bases,  and  Orthonormal  Bases 

The  resemblance  between  a  Riesz  basis  and  a  frame  is  striking.  Compare  (7.11)  with  (8.1).  One  might 
wonder  what  the  precise  relation  is.  So  far  we  know  that  a  Riesz  basis  is  a  frame.  To  go  deeper,  we  need 
a  definition:  a  frame  {fn}  which  ceases  to  be  a  frame  if  any  element  fk  is  deleted  is  said  to  be  an  exact 
frame.  Such  a  frame  has  no  redundancy.  A  frame  with  A  =  5  is  said  to  be  a  tight  frame.  The  defining 
property  reduces  to  llx|p  =  A'^  En resembling  Parseval’s  theorem  for  an  orthonormal  basis.  A 
frame  is  normalized  if  ||/n||  —  1  fo''^  The  following  facts  concerning  exact  frames  and  tight  frames  are 

fundamental. 

1.  A  tight  frame  with  A  =  5  =  1  and  |1/„1|  =  1  for  all  n  (i.e.,  a  normalized  tight  frame  with  frame  bound 
=  1)  is  an  orthonormal  basis  [Daubechies,  1992]. 


2.  {/„}  is  an  exact  frame  if  and  only  if  it  is  a  Riesz  basis  [Young,  1980].  Moreover,  if  a  frame  is  not  exact 
then  it  cannot  be  a  basis  [Heil  and  Walnut,  1989].  Thus  if  a  frame  is  a  basis  it  is  certainly  a  Riesz  basis. 

3.  Since  an  orthonormal  basis  is  a  Riesz  basis,  a  normalized  tight  frame  with  frame  bound  =  1  is  automat¬ 
ically  an  exact  frame. 

Examples. 

We  now  provide  some  examples  which  serve  to  clarify  the  preceding  concepts  and  definitions.  In  these 
examples  the  sequence  {e„},  n  >  1  is  an  orthonormaJ  basis  for  H.  Thus,  {e„}  is  a  tight  frame  with  A  =  B  =  1, 
and  ||e„||  =  1. 

Example  8.1.  Let  /„  =  e„/n  as  in  Example  7.1.  Then  {/n}  is  still  a  (Schauder)  basis  for  7i  but  it  is 
not  a  frame.  In  fact  this  satisfies  (8.1)  only  with  A  =  0.  That  is,  the  inverse  transformation  (reconstruction) 
from  (a:,  /„)  to  x  is  not  bounded.  To  see  why  ^  =  0,  note  that  if  we  let  a;  =  for  some  fc  >  0  then  ||x||  =  1 
whereas  Yin  l(^>/n)P  =  The  first  inequality  in  the  frame  definition  becomes  A  <  1/k^  which  cannot 

be  satisfied  for  all  k  unless  .4  =  0.  In  this  example  a  finite  B  works  because  l(a;,/„)|  =  |(x,e„)|/n  for  each 
n.SoEK^,/n)P<EI(^,en)P  =  Np.  ❖ 

Example  8.2.  Suppose  we  modify  the  above  example  as  follows:  define  ^  -I-  ej.  We  know  that 
this  is  no  longer  a  basis  (Example  7.2),  We  now  have  R  =  oo  in  the  frame  definition,  so  this  is  not  a  frame. 
To  verify  this,  let  x  =  d  so  l|x||  =  1.  Then  (x,/„}  =  1  for  all  n  >  1,  so  Yn  does  not  converge  to  a 

finite  value.  ^ 

Example  8.3.  Consider  the  sequence  of  vectors  {ei, 61,62,62,...}  This  is  a  tight  frame  with  frame 
bounds  A  =  B  =  2.  Note  that  even  though  the  vectors  are  normalized  and  the  frame  is  tight  this  is  not  a 
orthonormal  basis:  This  has  a  redundancy  of  two  in  the  sense  that  each  vector  is  repeated  twice.  This  frame 
is  not  even  a  basis,  therefore  not  a  Riesz  basis.  ^ 

Example  8.4.  Consider  the  sequence  of  vectors  {ci,  ■^,  . . .}  Again  there  is  redundancy 

so  it  is  not  a  basis.  It  is  a  tight  frame  with  A  =  B  =  1,  but  not  an  exact  frame,  and  clearly  not  a  basis.  It 
has  redundancy  (repeated  vectors). 

Frame  bounds  and  redundancy.  For  a  tight  frame  with  unit  norm  vectors  /„,  the  frame  bound 
measures  the  redundancy.  In  Example  8.3  the  redundancy  is  two  (every  vector  repeated  twice)  and  indeed 
A  =  R  =  2.  In  Example  8.4  where  we  still  have  redundancy,  the  frame  bound  A  =  R  =  1  does  not  indicate 
it.  The  frame  bound  of  a  tight  frame  measures  redundancy  only  if  the  vectors  /„  have  unit  norm  as  in 

Example  8.3. 
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8.3.  The  Frame  Operator,  Dual  Frame,  and  Biorthogonality 

The  frame  operator  J-  2issociated  with  a  frame  {/n}  in  a  Hilbert  space  H  is  a  linear  operator  defined  as 
follows: 

OO 

n=l 

The  summation  can  be  shown  to  be  convergent  using  the  definition  of  the  frame.  The  frame  operator  T 
takes  a  vector  x  and  produces  another  vector  in  H.  It  can  be  shown  that  the  norm  of  J^x  is  bounded 
as  follows: 

^tkll  <  ll^arll  <  B\\x\\.  (8.3) 

The  frame  operator  is  a  bounded  linear  operator  (since  B  <  oo),  hence  a  continuous  operator  [Naylor  and 
Sell,  1982].  Its  inverse  is  also  a  bounded  linear  operator  (since  A  >  0). 

From  Eq.  (8.2)  we  have  {Tx,x)  =  by  interchanging  the  inner  product  with  the  infinite 

summation.  [This  is  permitted  by  the  continuity  of  the  operator  T  and  the  continuity  of  inner  products 
(Sec.  6.2).]  Since  {/„}  is  complete,  the  right  hand  side  is  positive  for  x  Thus  {Tx,x)  >  0  unless  a:  =  0, 
that  is,  .F  is  a  positive  definite  operator.  The  realness  of  (.Tx,  x)  also  means  that  T  is  self-adjoint,  that  is, 
{Tx, y)  =  (x,  J^y)  for  any  x,y  eH  [Naylor  and  Sell,  1982]. 

The  importance  of  the  frame  operator  arises  from  the  fact  that  if  we  define  g„  =  fn  then  any  x  eH 
can  be  expressed  as 

OO  OO 

X  —  9n)  fn  —  ^  fn)9n-  (^•^) 

n=l  n=l 

The  sequence  {gn}  is  itself  a  frame  in  H  called  the  dual  frame.  It  has  frame  bounds  B~^  and  Among 
all  representations  of  the  form  x  =  c„/„,  the  representation  x  =  X)„(a:,5n)/n  has  the  special  property 

that  the  energy  of  the  coefficients  is  minimized,  that  is,  X)n|(^’5n)  |cnP  with  equality  if  and  only 

if  c„  =  {x,gn)  for  all  n  [Heil  and  Walnut,  1989].  As  argued  earlier  the  computation  of  (x,/„)  from  x  and 
the  inverse  computation  of  x  from  (x,/„)  are  numerically  stable  operations  because  B  <  oo  and  A  >  0 
respectively. 

For  the  special  case  of  a  tight  frame  {A  =  B),  the  frame  operator  is  particularly  simple,  that  is  J^x  =  Ax. 
In  this  case  gn  =  fn  =  fn/A.  Any  vector  x  €  W  can  be  expressed  as 

1  °° 

X  =  —  V(x,/„)/„  (tight  frames).  (8.5) 

^  n-l 

Notice  also  that  (8.1)  gives 

oo  2 

^j^x,/„)  =A||x||^  (tight  frames).  (8.6) 

n=l 
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For  a  tight  frame  with  A  =  1,  the  above  equations  resemble  the  representation  of  x  using  an  orthonormal 
basis  even  though  such  a  tight  frame  is  not  necessarily  a  basis,  because  of  possible  redundancy  (Example 
8.4). 

Exact  frames  and  biorthogonality.  For  the  special  case  of  an  exact  frame  (i.e.,  a  Riesz  basis)  the 
sequence  {/n}  is  minimal,  and  it  is  biorthogonal  to  the  dual  frame  sequence  {gn}-  This  is  consistent  with 
our  observation  at  the  end  of  Sec.  7.4. 

Summary  on  Frames.  A  sequence  of  vectors  {/„  }  in  a  Hilbert  space  W  is  a  frame  if  there  are  constants 
A  >  0  and  H  <  00  such  that  (8.1)  holds  for  every  vector  x  eH.  Frames  are  complete  (since  A  >  0)  but  not 
necessarily  linearly  independent.  The  constants  A  and  B  are  called  the  frame  bounds.  A  frame  is  tight  if 
A  =  B.  A  tight  frame  with  A  =  B  =  1  and  with  normalized  vectors  (||/n||  =  1)  is  an  orthonormal  basis.  For 
a  tight  frame  with  ||/„||  =  1,  the  frame  bound  A  measures  redundancy.  Any  vector  x  eV.  can  be  expressed 
in  either  of  the  two  ways  shown  in  (8.4).  Here  gn  =  where  T  is  the  frame  operator  defined  in  (8.2). 

The  frame  operator  is  a  bounded  linear  operator,  and  is  self-adjoint  (in  fact  positive).  The  sequence  {sf„} 
is  the  dual  frame  and  has  frame  bounds  JB"!  and  A-^  For  a  tight  frame  the  frame  representation  reduces 
to  (8.5).  A  frame  is  exact  if  deletion  of  any  vector  /„  destroys  the  frame  property.  A  sequence  {/«}  is  an 
exact  frame  if  and  only  if  it  is  a  Riesz  basis.  An  exact  frame  {/n}  is  biorthogonal  to  the  dual  frame  {ifn}- 

Fig.  8.2  is  a  Venn  diagram  which  shows  the  classification  of  frames  and  bases,  and  the  relationship 
between  these.  There  are  five  classes,  each  shown  by  a  rectangle. 

Frames 

Tight  frames  (A=B) 

Exact  frames  (Riesz  bases) 

Orthonormal  bases  (normalized 
tight  frames  with  A=B=1) 


Bases 


Fig.  8.2.  A  Venn  diagram  showing  the  relation  between  frames  and  bases  in  a 
Hilbert  space.  Each  rectangle  shows  a  class,  as  indicated. 
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9.  THE  STFT:  INVERTIBILtTY,  ORTHONORMAUTY  AND  LOCALIZATION 

In  Sec.  8  we  saw  that  a  vector  x  in  an  infinite  dimensional  Hilbert  space  (e.g.,  a  function  x{t)  in  £^)  can  be 
expanded  in  terms  of  a  sequence  of  vectors  {/„}  called  a  frame,  that  is  a;  =  g„)fn-  One  of  the  most 

important  features  of  frames  is  that  the  construction  of  the  expansion  coefficients  {x,g„)  from  x  as  well  as 
the  reconstruction  of  x  from  these  coefficients  are  numerically  stable  operations  because  A  >  0  and  B  <  oo, 
as  explained  in  Sec.  8.  Riesz  basis  and  orthonormal  basis,  which  are  special  cases  of  a  frame  (Fig.  8.2),  also 
share  this  numerical  stability. 

In  Sec.  3  we  tried  to  represent  an  function  in  terms  of  the  STFT.  The  STFT  coefficients  are 
constructed  using  the  integral  (3.3).  Denote  for  simplicity 

(9.1) 

Then  the  computation  of  the  STFT  coefficients  can  be  written  as 

Xstft{kuJs,nTs)  =  (^x{t),gkn{t))  (9-2) 

This  is  a  linear  transformation  which  converts  a;(f)  into  a  two  dimensional  sequence  because  k  and  n  are 
integers.  Our  hope  is  to  be  able  to  reconstruct  x{t)  using  an  inverse  linear  transformation  (inverse  STFT) 
of  the  form 

CO  CC 

fc=— CO  n=-oo 

We  know  that  this  can  indeed  be  done  in  a  numerically  stable  manner  if  {gkn{t)}  is  a  frame  in  and  {fkn{t)} 
the  dual  frame.  The  fundamental  questions  then  are;  under  what  conditions  does  {gknit)}  constitute  a  frame? 
Under  what  further  conditions  does  this  become  a  Riesz  basis,  better  still,  an  orthonormal  basis?  With  such 
conditions,  what  are  the  time-frequency  localization  properties  of  the  resulting  STFT?  The  answers  depend 
on  the  window  v{t),  and  the  sample  spacings  ujs  and  Tg. 

We  will  first  construct  a  very  simple  example  which  shows  the  existence  of  orthonormal  STFT  bases, 
and  indicate  a  fundamental  disadvantage  in  the  example.  We  will  then  state  the  answers  to  the  above 
general  questions  without  proof.  Details  can  be  found  in  a  number  of  references,  e.g.,  [Heil  and  Walnut, 
1989],  [Daubechies,  1992],  and  [Benedetto  and  Frazier,  1994]. 


Fig.  9.1.  The  rectangular  window  in  STFT. 
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Example  9.1.  Orthonormal  STFT  Basis. 

Suppose  v{t)  is  the  rectangular  window  shown  in  Fig.  9.1,  applied  to  an  function  x{t).  The  product 
x{t)v{t)  therefore  has  finite  duration.  If  we  sample  its  Fourier  transform  at  the  rate  uJs  =  27r  we  can  recover 
x(t)v{t)  from  these  samples  (this  is  like  a  Fourier  series  of  the  finite  duration  waveform  x(t)v{t)).  Shifting 
the  window  by  successive  integers,  we  can  in  this  way  recover  successive  pieces  of  x{t)  from  the  STFT,  with 
sample  spacing  Us  =  in  the  frequency  domain.  We  see  that  the  choice  Tg  =  1  and  Us  =  2tt  (so  =  27r) 
leads  to  an  STFT  Xstfi{kus,  nTs),  from  which  we  can  reconstruct  x{t)  for  all  t.  The  quantity  gkn{t)  becomes 

gk„it)  =  v{t  -  =  v{t  -  (9.4) 

Since  the  succcessive  shifts  of  the  window  do  not  overlap,  the  functions  gknit)  are  orthonormal  for  different 
values  of  n.  The  functions  are  also  orthonormal  for  different  values  of  k.  Summarizing,  the  rectangular  window 
of  Fig.  9.1,  with  the  time-frequency  sampling  durations  Ts  =  1  and  Us  =  Stt  produces  an  orthonormal  STFT 
basis  for  functions.  ^ 

This  example  is  reminiscent  of  the  Nyquist  sampling  theorem  in  the  sense  that  we  can  reconstruct  x{t) 
from  (time-frequency)  samples.  But  the  difference  is  that  x{t)  is  an  signal,  not  necessarily  bandlimited. 
Note  that  Tg  and  Ug  cannot  be  arbitrarily  interchanged  (even  if  ufgTg  =  27r  preserved).  Thus  if  we  had  chosen 
Tg  =  2  and  ujg  =  tt  (preserving  the  product  ojgTg)  we  would  not  have  obtained  a  basis  because  two  successive 
positions  of  the  window  would  be  spaced  too  far  apart  and  we  would  miss  fifty  percent  of  the  signal  x{t). 

9.1.  Time- Frequency  Sampling  Density  for  Frames  and  Orthonormal  Bases 

Let  us  cissume  that  v(t)  is  normalized  to  have  unit  energy,  that  is  f  \v{t)\'^dt  =  1  so  that  ||5fcn(0ll  ~  ^ 

k,  n.  If  we  impose  the  condition  that  gkn{t)  be  a  frame,  then  it  can  be  shown  that  the  frame  bounds  satisfy 

the  condition 

A<^<B.  (9.5) 

s 

regardless  of  how  v{t)  is  chosen.  Since  an  orthonormal  basis  is  a  tight  frame  with  A  =  B  =  I,  an  STFT 
orthonormal  basis  must  have  UgTg  =  27r. 

It  can  further  be  shown  that  if  lOsTg  >  2n,  then  {5t„(t)}  cannot  be  a  frame.  For  UgTg  <  2tt  we 
can  find  frames  by  appropriate  choice  of  window  v{t).  The  critical  time-frequency  sampling  density  is 
(wsTs)"^  =  (27r)“^.  If  the  density  is  smaller  we  cannot  have  frames  and  if  it  is  larger  we  cannot  have 
orthonormal  basis,  but  only  frames. 

Orthonormal  STFT  bases  have  poor  time-frequency  localization.  Thus  if  we  wish  to  have 
an  orthonormal  STFT  basis,  the  time-frequency  density  is  constrained  to  be  such  that  LOgTg  =  27r.  Under 
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this  condition  suppose  we  choose  v{t)  appropriately  to  design  such  a  basis.  The  time  frequency  localization 
properties  of  this  system  can  be  judged  by  computing  the  mean  square  durations  Df  and  Dj-  defined  in 
(3.6).  It  has  been  shown  by  Balian  and  Low  [Daubechies,  1992],  [Benedetto  and  Frazier,  1994]  that  one  of 
these  is  necessarily  infinite  no  matter  how  we  design  v{t).  Thus  an  orthonormal  STFT  basis  always  satisfies 
DtDf  =  00.  That  is,  either  the  time  localization  or  the  frequency  resolution  is  very  poor.  This  is  summarized 
in  the  following  theorem. 

Theorem  9.1.  Let  the  window  v{t)  be  such  that  {5fcn(t)}  in  (9.1)  is  an  orthonormal  basis  for  (which 
means,  in  particular  that  —  lit).  Define  the  rms  durations  Dt  and  Df  for  the  window  v{t)  as  usual 
[Eq.  (3.6)].  Then  either  =  oo  or  D/  =  oo.  ^ 

Return  now  to  Example  9.1  where  we  constructed  an  orthonormcd  STFT  basis  using  the  rectangular 
window  of  Fig.  9.1.  Here  Ts  =  l  and  =  27r  (so  that  WsTs  =  2t).  The  window  v{t)  has  finite  mean  square 
duation  D^.  Its  Fourier  transform  V{ijS)  has  magnitude  |F(w)|  =  |  sin((x;/2)/(ti;/2)|  so  that  J u;‘^\V{uj)\'^duj 
is  not  finite.  This  demonstrates  the  result  of  Theorem  9.1.  One  can  try  to  replace  the  window  v{t)  with 
something  for  which  DtDf  is  finite  but  this  cannot  be  done  without  violating  orthonormality. 

Instability  of  the  Gabor  transform.  Gabor  constructed  the  STFT  using  the  Gaussian  window 
v{t)  =  In  this  case  the  sequence  of  functions  {gknit)}  can  be  shown  to  be  complete  in  (in  the 

sense  defined  in  Sec.  7.2)  as  long  as  oJsTs  <  2ir.  However,  if  WsT*  =  27r  then  the  system  is  not  a  frame  because 
it  can  be  shown  that  A  =  0  in  (8.1)!  Thus  the  reconstruction  of  x{t)  from  Xsift{ku;s,nTs)  is  unstable  if 
u)gTs  =  2n-  (see  Sec.  8)  even  though  {gkn{t)}  is  complete.  So  even  though  the  Gabor  transform  has  the 
ideal  time  frequency  localization  (minimum  DtD/),  it  cannot  provide  a  stable  basis,  hence  certainly  not  an 
orthonormal  basis,  whenever  uigTs  =  27r. 

Since  orthonormal  STFT  basis  is  not  possible  if  2n,  this  shows  that  we  can  never  have  an 

orthonormal  basis  with  the  Gabor  transform  (Gaussian  windowed  STFT),  no  matter  how  we  choose  and 
Ts.  The  Gabor  example  also  demonstrates  the  fact  that  even  if  we  successfully  construct  a  complete  set  of 
functions  (not  necessarily  a  basis)  to  represent  x{t),  it  may  not  be  useful  because  of  instabilty  of  recon¬ 
struction.  If  we  construct  Riesz  bases  (e.g.,  orthonormal  bases)  or  more  generally  frames,  this  disadvantage 
goes  away.  For  example  with  the  Gabor  transform  if  we  let  WsTg  <  27r  then  all  is  well:  we  get  a  frame 
(so  A  >  0  and  H  <  c»  in  (8.1));  we  have  stable  reconstruction  and  good  time  frequency  localization,  but 
not  orthonormality.  Fig.  9.2  summarizes  these  results  pertaining  to  the  time-frequency  product  WsTs  in  the 
STFT. 

A  major  advantage  of  the  wavelet  transform  over  the  STFT  is  that  it  is  free  from  the  above  diflttculties. 
For  example  we  can  obtain  an  orthonormal  basis  for  with  excellent  time-frequency  localization  (finite, 
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controllable  DtDf).  We  will  also  see  how  to  constrain  such  a  wavelet  to  have  the  additional  property  of 
regularity  or  smoothness.  Regularity  is  a  property  which  is  measured  by  the  continuity  and  differentiability 
of  ^(t).  More  precisely  it  is  quantified  by  the  Holder  index  (to  be  defined  in  Sec.  13.1).  In  the  next  few 
sections  where  we  construct  wavelets  based  on  paraunitary  filter  banks  ,  we  will  see  how  to  achieve  all  this 
systematically. 


00sTs<2Tt;  good 
tight  frames  possible 


Fig.  9.2.  Behavior  of  STFT  representations  for  various  regions  of  time-frequency 
sampling  product  The  curve  UsTs  =  27r  is  critical;  see  text. 


10.  WAVELETS  AND  MULTIRESOLUTION 

In  Sec.  11-13  we  will  show  how  to  construct  compactly  supported  wavelets  systematically  to  obtain  orthonor¬ 
mal  bases  for  The  construction  is  such  that  excellent  time-frequency  localization  is  possible.  Moreover 
the  smoothness  or  regularity  of  the  wavelets  can  be  controlled.  The  construction  is  based  on  the  two  channel 
paraunitary  filter  bank  described  in  Sec.  4.  In  that  section  the  synthesis  filters  are  denoted  as  Gs{z)  and 
Hs(z)  with  impulse  responses  gs{n)  and  hs{n)  respectively. 

All  constructions  are  based  on  obtaining  the  wavelet  «/'(<)  and  an  auxilliary  function  (f>{t)  called  the 
scaling  function,  from  the  impulse  response  sequences  5's(n)  and  hs(n).  We  will  do  this  by  using  time  domain 
recursions  of  the  form 

oo  oo 

0(t)  =  2  ^  gs{n)^{2t  -  n),  ip{t)  =  2  E  hs{n)(p{2t  —  n),  (10-1) 

n=  — OO  n=— oo 

called  dilation  equations.  Equivalently  in  the  frequency  domain 

#(a;)  =  G,(eJ‘^/2)^(a;/2),  'F(a;)  =  (10.2) 

It  turns  out  that  if  {Gs{z),  Hs{z)}  is  a  paraunitary  pair  with  further  mild  conditions  (e.g.,  that  the  lowpass 
filter  Gs{e^‘^)  has  a  zero  at  n  and  no  zeros  in  [0, 7r/3])  the  recursions  can  be  solved  to  obtain  V'(t)  which  gives 
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rise  to  an  orthonormal  wavelet  basis  —  w)}  for  By  constraining  Gs{e^^)  to  have  a  sufficient 

number  of  zeros  at  tt  we  can  further  control  the  Holder  index  (or  regularity)  of  ij{t)  as  we  see  in  Sec.  13. 

Our  immediate  aim  is  to  give  an  explanation  for  the  occurence  of  the  function  <A(t),  and  the  curious 
recursions  (10.1)  called  the  dilation  equations  or  two-scale  equations.  These  have  origin  in  the  beautiful 
theory  of  multiresolution  for  spaces  [Meyer,  1986],  [Mallat,  1989].  Since  multiresolution  theory  lays  the 
foundation  for  the  construction  of  the  most  practical  wavelets  to  date,  we  give  a  brief  description  of  it  here. 

10.1.  The  Idea  of  Multiresolution 

Return  to  Fig.  2.13(a)  where  we  interpreted  the  wavelet  transformation  as  a  bank  of  continuous  time 
analysis  filters  followed  by  samplers,  and  the  inverse  transformation  as  a  bank  of  synthesis  filters.  Assume 
for  simplicity  the  filters  are  ideal  bandpass.  Fig.  2.13(b)  is  a  sketch  of  the  frequency  responses.  The  bandpass 
filters  Fk{u>)  —  get  narrower  and  narrower  as  k  decreases  (i.e.,  as  k  becomes  more  and  more 

negative).  Instead  of  letting  k  be  negative,  suppose  we  keep  only  fc  >  0  and  include  a  lowpass  filter  ^(w)  to 
cover  the  low  frequency  region.  Then  we  get  the  picture  of  Fig.  10.1.  This  is  analogous  to  Fig.  2.12  where 
we  used  the  pulse  function  <(>{t)  instead  of  using  negative  k  in  V’(2*f  -  n). 


Fig.  10.1.  The  lowpass  function  $(w),  bandpass  function  '^(w),  and 
the  streched  bandpass  filters  Ffc(a;). 

Imagine  for  a  moment  that  $(a;)  is  an  ideal  lowpass  filter  with  cutoff  ±7r.  Then  we  can  represent  any 
function  F(w)  with  support  restricted  to  ±7r  in  the  form  F(oj)  =  This  is  simply 

the  Fourier  series  expansion  of  F(ij)  in  [— tt, tt],  and  it  follows  that  °°  (Theorem  6.1).  In  the 
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time  domain  this  means 


(10.3) 


fw  - 

n=— CO 

Let  us  denote  by  Vq  the  closure  of  the  span  of  {^(t  —  n)}.  Thus,  Vq  is  the  class  of  signals  that  are 
bandlimited  to  [— tt,  tt].  Since  (t>{t)  is  the  sine  function,  the  shifted  functions  {(j){t  —  n)}  form  an  orthonormal 
basis  for  Vq. 

Consider  now  the  subspace  Wq  C  of  bandpass  functions  bandlimited  to  tt  <  |a;|  <  2it.  The  bandpass 
sampling  theorem  (Sec.  2.1)  allows  us  to  reconstruct  such  a  bandpass  signal  g{t)  from  its  samples  g{n)  by 
using  the  ideal  filter  ^(w).  Denoting  the  impulse  response  of  ^(w)  by  ^(t)  we  see  that  {■)p{t  —  n)}  spans  Wa¬ 
lt  can  be  verified  that  {ij{t  -  n)}  is  an  orthonormal  basis  for  Wo-  Moreover,  since  ’^'(a;)  and  $(a;)  do  not 
overlap,  it  follows  from  Parseval’s  theorem  that  Wq  is  orthogonal  to  Vo- 

Next  consider  the  space  of  all  signals  of  the  form  f{t)  +  g{t)  where  /(t)  £  Vq  and  g{t)  £  Wq-  This  space 
is  called  the  direct  sum  (or  orthogonal  sum)  of  Vq  and  Wq,  and  is  denoted  as  Vi  =  bo  0  Wo-  It  is  the  space 
of  all  signals  bandlimited  to  [— 27r,  27r].  We  can  continue  in  this  manner  and  define  the  spaces  Vk  and 
Wk  for  all  k.  Then  14  is  the  space  of  all  signals  bandlimited  to  [-2*=7r,  2'=:r].  And  Wk  is  the  space  of 
functions  bandlimited  to  2'^t:  <  |w|  <  2*+V.  The  general  recursive  relation  is  I4+i  =  14  ©  Wk-  Fig.  10.2 
demonstrates  this  for  the  case  where  the  filters  are  ideal  bandpass.  Only  the  positive  half  of  the  frequency 
axis  is  shown  for  simplicity. 


V, 


Wf 


W, 


W, 


n  2n  4n  Sti  (o 

Fig.  10.2.  Towards  multiresolution  analysis...  The  spaces  {14} 
and  {W*}  spanned  by  various  filter  responses. 


It  is  clear  that  we  could  imagine  14  itself  to  be  composed  of  subspaces  VLi  and  W-i-  Thus  14  = 
0  W-i,  V-i  =  VL2  ©  1V_2,  and  so  forth.  In  this  way  we  have  defined  a  sequences  of  spaces  {14}  and 
{Wk}  for  all  integers  k  such  that  the  following  conditions  are  true: 

Vk+i  =  14  ©  Wk,  and  W*  ±  k^m.  (10.4) 

where  ±  means  “orthogonal”.  That  is,  the  functions  in  Wk  are  orthogonal  to  those  in  Wm-  It  is  clear  that 
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VkCVk+i. 

We  will  see  later  that  even  if  the  ideal  filters  ^(ui)  and  ^{w)  are  replaced  with  non  ideal  approximations, 
we  can  sometimes  define  sequences  of  subspaces  14  and  Wk  satisfying  the  above  conditions.  The  importance 
of  this  observation  is  this:  whenever  ^(w)  and  $(w)  are  such  that  we  can  construct  such  a  subspace  structure, 
the  impulse  response  ifit)  of  the  filter  can  be  used  to  generate  an  orthonormal  wavelet  basis!  While 

this  might  seem  too  complicated  and  roundabout,  we  will  see  that  the  construction  of  the  function  <?!>(t)  is 
quite  simple  and  elegant,  and  simplifies  the  constrution  of  orthonormal  wavelet  bases.  A  realization  of  these 
ideas  based  on  paraunitary  filter  banks  will  be  presented  in  Sec.  11.  It  is  now  time  to  be  more  precise  with 
definitions  as  well  as  statements  of  the  results. 

Definition  10.1.  Multiresolution  analysis.  Consider  a  sequence  of  closed  subspaces  {14}  in 
satisfying  the  following  six  properties. 

1.  Ladder  property.  . . .  14.2  C  14.i  C  14  C  Vj  C  14  •  •  • 

OO 

2.  n  Vk  =  {0} 

A:=  — OO 

OO 

3.  Closure  of  [J  14  is  equal  to  L^. 

/c=  — OO 

4.  Scaling  property.  x{t)  €  14  if  and  only  if  x{2t)  €  14+i.  Since  this  implies  “a;(f)  €  14  if  and  only  if 
x{2^t)  €  14”,  all  the  spaces  14  are  scaled  versions  of  the  space  14-  For  k  >  0,  14  is  a  finer  space  than 
14. 

5.  Translation  invariance.  If  x{t)  €  14  then  x(t  -n)  €  14,  that  is,  the  space  14  is  invariant  to  translations 
by  integers.  By  the  previous  property  this  means  that  14  is  invariant  to  translations  by  2“*n. 

6.  Special  orthonormal  basis.  There  exists  a  function  (/>(t)  €  To  such  that  the  integer  shifted  versions 

{(j>{t  -  n)}  form  an  orthonormal  basis  for  Tq.  By  property  4  this  means  that  -  n)}  is  an 

orthonormal  basis  for  Tfc.  The  function  (fit)  is  called  the  scaling  function  of  multiresolution  analysis.  0 

Comments  on  the  definition.  Notice  that  the  scaling  function  <!>{t)  determines  To,  hence  all  T*.  We 
say  that  <p{t)  generates  the  entire  multiresolution  analysis  {Tj,}.  The  sequence  {T^}  is  said  to  be  a  ladder 
of  subspaces  because  of  the  inclusion  property  Tt  C  Tfc+i.  The  technical  terms  closed  and  closure  which 
originate  from  metric  space  theory,  have  simple  meaning  in  our  context  (because  is  a  Hilbert  space).  The 
subspace  T*,  is  “closed”  if  the  following  is  true:  whenever  a  sequence  of  functions  {fn{t)}  €  Vk  converges  to  a 
limit  f{t)  e  (i.e,  l|/(t)  -  /„(t)||  0  as  n  ^  oo),  the  limit  f{t)  is  in  Tfc  itself.  In  general  an  infinite  union 

of  closed  sets  is  not  closed,  that  is  why  we  need  to  take  “closure”  in  the  third  property  above.  The  third 
property  simply  means  that  any  element  x{t)  €  can  be  approximated  arbitrary  closely  (in  the  I^-norm 
sense)  by  an  element  in  U^-oc 


73 


General  meaning  of  Wk-  In  the  general  setting  of  the  above  definition,  the  subspace  Wk  is  defined  as 
the  orthogonal  complement  of  14  with  respect  to  Vk+i-  Thus  the  relation  T4+i  =  14  ©  Wk,  which  was  valid 
in  the  ideal  bandpass  case  (Fig.  10.2),  continues  to  hold. 

The  Haar  Multiresolution.  A  simple  example  of  multiresolution  where  #(w)  is  not  ideal  lowpass 
is  the  Haar  multiresolution,  generated  by  the  function  <j){t)  in  Fig.  10.3(a).  Here  Vo  is  the  space  of  all 
functions  that  are  piecewise  constants  on  intervals  of  the  form  [n,n  +  1].  We  will  see  later  that  the  function 
^(t)  associated  with  this  example  is  as  in  Fig.  10.3(b);  the  space  Wo  is  spanned  by  {rp{t  -  u)}.  The  space 
14.  contains  functions  which  are  constants  in  [2“*^n,  2“*(n  +  1)].  Fig.  10.3(c)  and  (d)  show  examples  of 
functions  belonging  to  To  and  Vj.  For  this  example,  the  six  properties  in  the  definition  of  multiresolution 
are  particularly  clear  (except  perhaps  property  3,  which  can  be  proved  too). 

The  multiresolutioii  analysis  generated  by  the  ideal  bandpass  filters  (Figs.  10.1,  10.2)  is  another  simple 
example,  where  4){t)  is  the  sine  function.  We  see  that  the  two  elementary  orthonormal  wavelet  examples 
(Haar  wavelet  and  the  ideal  bandpass  wavelet)  also  generate  a  corresponding  multiresolution  analysis.  The 
connection  between  wavelets  and  multiresolution  is  deeper  than  this,  and  is  elaborated  in  Sec.  10.2. 


(a) 


(b) 


(t)(t)  the  scaling  function 
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Fig.  10.3.  The  Haar  multiresolution  example,  (a)  The  scaling  function  4>{t)  that 
generates  multiresolution,  (b)  the  function  V'(0  which  generates  Wq, 

(c)  example  of  a  memeber  of  To  and  (d)  example  of  a  memeber  of  Ti . 
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Derivation  of  the  Dilation  Equation 

Since  {■\/2<i){2t  —  n)}  is  an  orthonormal  basis  for  Vi  (see  property  6)  and  since  4>it)  £  Vo  C  Vj  we  see 
that  <j){t)  can  be  expressed  as  a  linear  combination  of  the  functions  {v^^(2t  -  n)}.  Let  us  write 

OO 

9s{n)(l>{2t  -  n)  (dilation  equation)  (10.5) 

n=  — OO 

Thus  the  dilation  equation  arises  naturally  out  of  the  multiresolution  condition.  For  example,  the  Haar 
scaling  function  ^(t)  satisfies  the  dilation  equation 

-  1).  (10.6) 


The  notation  gs{n)  and  the  factor  2  in  the  dilation  equation  might  appear  arbitrary  now,  but  are  convenient 
for  future  use.  Orthonormality  of  {^(t-n)}  implies  that  ||«!.(t)||  =  1,  and  that  {V2(j){2t-n)}  are  orthonormal. 
So  lff^r(n)P  =  0.5  from  (10.5). 

Example  10.1.  Non  orthonormal  multiresolution. 


(a) 


(b) 


Fig.  10.4.  Example  of  a  scaling  function  4>{t)  generating  nonorthogonaJ 
multiresolution,  (a)  The  scaling  function,  and  (b)  demonstrating  the  dilation  equation. 


Consider  the  triangular  function  shown  in  Fig.  10.4(a).  This  has  l|i?!>(t)||  =  1  and  satisfies  the  dilation 
equation 

0(f)  =  0(2f)  +  O.50(2f  -  1)  +  O.50(2f  +  1)  (10.7) 

as  demonstrated  in  Fig.  10.4(b).  With  14  denoting  the  closure  of  the  span  of  {2*'/^0(2'‘'f  -  n)}  it  can  be 
shown  that  the  spaces  {14}  satisfy  all  the  conditions  in  the  multiresolution  definition,  except  one.  Namely, 
{0(f  -  n)}  does  not  form  an  orthonormal  basis  [for  example  compare  0(f)  and  0(f  -  1)].  We  will  see  later 
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(Example  10.2)  that  it  does  form  a  Riesz  basis  and  that  it  can  be  converted  into  an  orthonormal  basis  by 
orthonormalization.  This  example  is  a  special  case  of  family  of  scaling  functions  called  spline  functions.  0 

We  -will  see  below  that  starting  from  an  orthonormal  multiresolution  system  (in  particular  from  the 
function  4>{t))  one  can  generate  an  orthonormal  wavelet  basis  for  .  The  wavelet  bases  generated  from 
splines  d’(i)  (after  orthonormalization)  are  called  spline  wavelets  [Chui,  1992a, b].  These  are  also  called  the 
Battle-Lemarie  family  of  wavelets.  The  link  between  multiresolution  analysis  and  wavelets  will  be  explained 
quantitatively  in  Sec.  10.2. 

Multiresolution  Approximation  of  Functions. 

Given  a  multiresolution  analysis,  we  know  that  Pl^-oo  =  {0}  and  that  the  closure  of  Ufc=-oo  “ 
L^.  From  this  it  can  be  shown  that  the  Wfc’s  make  up  the  entire  space,  that  is 

OO 

0  Wk.  (10-8«) 

k=  —  oo 

We  can  approximate  an  arbitrary  function  x{t)  to  a  certain  degree  of  accuracy  by  projecting  it  onto  14 
for  appropriate  k.  Thus  let  Xk{t)  be  this  orthogonal  projection  (Sec.  2.2).  Suppose  we  increase  k  to  k  +  1. 
Since  Vk+i  =  14  0  Wk  and  Wk  is  orthogonal  to  Tjt,  we  see  that  the  new  approximation  Xk+\{t)  (projection 
onto  the  finer  space  14+i)  is  given  by  Xk+iit)  =  Xk{t)  +  yk{t)  where  yk{t)  is  in  Wk- 

Thus,  when  we  go  from  scale  k  to  scale  fc  +  1  we  go  to  a  bigger  space  14+i  D  14  which  permits  a  finer 
approximation.  This  is  nicely  demonstrated  in  the  two  extreme  examples  mentioned  above.  For  the  example 
with  ideal  filters  (Figs.  10.1,  10.2),  the  process  of  passing  from  scale  to  +  1  is  like  admitting  higher 
frequency  components,  which  are  orthogoiicd  to  the  existing  lowpass  components.  For  the  Haar  example 
Fig.  10.3  where  ^(t)  and  <l>{t)  are  square  pulses,  when  we  pass  from  /r  to  /r  +  1  we  permit  finer  pulses 
(i.e.,  highly  localized  finer  variations  in  time  domain).  For  this  example.  Figs.  10.3(c)  and  (d)  demonstrate 
the  projections  Xk{t)  and  Xk+i{t)  at  two  successive  resolutions.  The  projections  are  piecewise-constant 

approximations  of  an  signal  x{t). 

By  repeated  application  of  V*+i  =  14  ©  114  wc  CS’H  express  To  3^ 

To  =  0  Wk.  (10.86) 

A;=~oo 

which,  together  with  (10.8a)  yields 

=  Vo®Wo®W\®W2®...  (10.8c) 

This  has  a  nice  interpretation  based  on  Fig.  10.2.  The  signal  x{t)  has  been  decomposed  into  orthogonal 
components  belonging  to  14  (lowpass  component).  Wo  (bandpass  component),  Wi  (bandpass  with  higher 
bandwidth  and  center  frequency),  and  so  forth. 
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We  can  find  an  infinite  number  of  multiresolution  examples  by  choosing  4){t)  appropriately.  It  is  more 
important  now  to  obtain  systematic  techniques  for  constructing  such  examples.  The  quality  of  the  example 
is  governed  by  the  quality  of  -^{t)  and  that  is,  the  time  localization  and  frequency  resolution  they  can 
provide,  the  smoothness  (regularity)  of  these  functions,  and  the  ease  with  which  we  can  implement  these 
approximations. 

10.2.  Relation  between  Multiresolution  and  Wavelets 

Suppose  generates  an  orthonormal  multiresolution  {14}  as  defined  in  Sec.  10.1.  We  know 

(f>{t)  e  VJ)  and  that  -  n)}  is  an  orthonormal  basis  for  Vq.  Moreover  cj>{t)  satisfies  the  dilation  equation 
(10.5),  and  the  sequence  {5is(n)}  £  defines  the  filter  Gs(e-'“). 

Now  consider  the  finer  space  Vi  =  Vq  ©  Wq,  where  Wo  is  orthonormal  to  Vq.  If  f(t)  £  Wq  then  f{t)  £  Vi 
so  it  is  a  linear  combination  of  \/2<fi(2t  -  n)  (property  6).  Using  this  and  the  fact  that  Wo  is  orthogonal  to 
Vq  we  can  show  that  (the  £^-FT  of  f{t))  has  a  special  form.  This  is  given  by 

F(w)  =  e^'"/2G*(-e^'"/2)#(u;/2)F(e>"), 

where  is  27r-periodic.  The  special  case  of  this  with  =  1  will  be  denoted  ^(w),  that  is, 

^{u>)  =  e-'"/2G‘(-e^‘^/2)#(u;/2).  (10.9) 

The  above  definition  of  '^(u;)  is  equivalent  to 

OO 

^(f)  =  2  (  — 1)""'’^5*(— n  —  l)4>(2t~  n)  (dilation  equation  for  ipit)).  (10.10) 

n=~oo 

The  function  ip{t)  satisfying  this  equation  has  some  useful  properties.  First,  it  is  in  L^.  This  follows  from 
Theorem  6.2,  since  |fl's(^)P  is  finite.  It  can  be  shown  that  -^{t  —  n)  £  Wq  and  that  {ip{t  —  n)}  is  an 
orthonormal  basis  for  Wq.  This  implies  that  {2*/^V'(2*‘'f  -  n)}  is  an  orthonormal  basis  for  Wfc  (because 
/(f)  £  Wo  if  and  only  if  /(2*f)  £  W*,  which  is  a  property  induced  by  the  scaling  property  (property  4  in 
the  definition  of  multiresolution).  In  view  of  (10.8a)  we  conclude  that  the  sequence  {2*/^^(2'-'f  -  n)}  with  k 
and  n  varying  over  all  integers,  forms  a  bcisis  for  L^.  Summarizing  we  have  the  following  result: 

Theorem  10.1.  Let  ^(f)  £  L?  generate  an  orthonormal  multiresolution,  i.e.,  a  ladder  of  spaces  {14} 
satisfying  the  six  properties  in  Definition  10.1.  That  is,  in  particular,  -  n)}  is  an  orthonormal  basis  for 
Vq.  Then  4>{t)  satisfies  the  dilation  equation  (10.5)  for  some  gs{n)  with  Yhn  Define  the  function 

^(f)  according  to  the  dilation  equation  (10.10).  Then  i){t)  £  Wq  C  £^,  and  {^'(f  -  «)}  is  an  orthonormal 
basis  for  Wq.  Therefore  {2*/^V'(2*'f  -  >i)}  is  an  orthonormal  basis  for  Wfc,  just  cis  {2*^/^0(2*f  —  n)}  is  an 


orthonormal  basis  for  Vu  (for  fixed  k).  Moreover  with  k  and  n  varying  over  all  integers,  the  doubly  indexed 
sequence  —  ra)}  is  an  orthonormal  wavelet  basis  for  0 

Thus,  to  construct  a  wavelet  basis  for  we  only  have  to  construct  an  orthonormal  basis  {(()(t  -  n)]  for 
Vq.  Everything  else  follows  from  that.  All  proofs  can  be  found  in  a  number  of  references,  e.g.,  [MaJlat,  1989], 
[Chui,  1992a],  and  [Daubechies,  1992], 

10.3.  Relation  Between  Muitiresolution  Analysis  and  Paraunitary  Filter  Banks 
Denoting 

K{n)  =  (-1)"+15:(-1  ~  n),  i.e.,  Hsie^  =  e>“G:(-e>“), 

we  see  that  (j){t)  and  ip{t)  satisfy  the  two  dilation  equations  in  (10.1).  By  construction  ip(t)  E  Wq  and 
^(t)  €  Vq.  The  fact  that  Wq  and  Vq  are  mutually  orthogonal  subspaces  can  be  used  to  show  that  Hs{e^"^) 
and  Gs(e^'^)  satisfy 

G:(e^“)£r,(e-''“)  +  G:(-e^“)£f.(-e^'“)  =  0.  (10.11) 

Moreover  it  can  be  shown  that  orthonormality  of  {<j>{t  -  n)}  leads  to  the  power  complementary  property 

|G.(e>-)p  +  |G.(-e^“)P  =  1.  (10.12) 


In  other  words,  Gs(e-'")  is  a  power  symmetric  filter!  That  is,  the  filter  \Gs{e^^)\‘^  is  a  half  band  filter.  Using 
Hs{e^'^)  —  e^“G*(-e-’"),  we  also  have 

\H,{en\^  +  |^.(-e-''‘^)p  =  1.  (10.13) 


A  compact  way  to  express  the  above  three  equations  is  by  defining  the  matrix 

.  _  G,{en  Hsien  ' 


The  three  properties  (10.11)-(10.13)  are  equivalent  to  Gj(e'^“)Gs(e-^“)  —  I,  that  is  the  matrix  Gs{e^‘^)  is 
unitary  for  all  w.  This  matrix  was  defined  in  Sec.  4.4  in  the  context  of  paraunitary  digital  filter  banks.  Thus, 
the  filters  Gs(e^“)  and  Hs{e^^)  constructed  from  a  multiresolution  setup  as  above  constitute  a  paraunitary 
(CQF)  synthesis  bank. 

Thus,  Orthonormal  multiresolution  automatically  gives  rise  to  paraunitary  filter  banks!  Starting  from 
a  multiresolution  analysis  we  obtained  two  functions  <f>{t)  and  V'(f)-  These  functions  generate  orthonormal 
bases  {(j){t  -  n)}  and  {'tp{t  -  n)}  for  the  orthogonal  subspaces  Vq  and  Wq.  The  functions  4>{t)  and  ^l}{t) 
generated  in  this  way  satisfy  the  dilation  equation  (10.1).  Defining  the  filters  Gs(z)  and  Hs{z)  from  the 
coefficients  Psin)  and  fis(n)  in  an  obvious  way,  we  find  that  these  filters  form  a  paraunitary  pair! 
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This  raises  the  following  fundamental  question.  If  we  start  from  a  paraunitary  pair  {Gs(2), ^fs(^)} 
and  define  the  functions  ^{t)  and  ^(t)  by  (successfully)  solving  the  dilation  equations,  do  we  obtain  an 
orthonormal  basis  {<i>{t  -  n)}  for  multiresolution,  and  a  wavelet  basis  -  n)}  for  the  space  of 

functions?  The  answer,  fortunately,  is  in  the  affirmative,  subject  to  some  minor  requirements  which  can  be 
trivially  satisfied  in  practice.  We  return  to  this  in  Sec.  11. 

Generating  Wavelet  and  Multiresolution  Coefficients  From  Paraunitary  Filter  Banks 

Recall  that  the  subspaces  Vq  and  Wo  have  the  orthonormal  bases  {4>{t  -n)}  and  {V’(t  -  «)}  respectively. 
By  the  scaling  property,  the  subspace  Vk  has  the  orthonormal  basis  {4>kn{t)}i  and  similarly  the  subspace  W* 
hcis  the  orthonormal  basis  {V'fcn(f)}i  where,  as  usual,  <f>kn{t)  =  2'‘^^(j>(2^t  —  n)  and  i^knit)  =  2*/^V'(2  t  —  n). 
The  orthogonal  projections  of  a  signal  x{t)  e  onto  14  and  Wj,  are  given,  respectively,  by 

OO  \ 

Pk[3:it)]=  Y.  {^i*)^'f’knit))4>knit),  and  (5fc[a;(<)]=  Y  {^it)^i’knit))i’kn{t)  (10.14) 

n=:  — OO  n=  — CO 

(see  Sec.  2.2).  Denote  the  scale-fc  projection  coefficients  as  dk{n)  =  {x(t),(i>kn{t))  and  Cfc(n)  =  {x{t),ipkn{t)) 
for  simplicity.  (The  notation  Ckn  was  used  in  earlier  sections,  but  Ckin)  is  convenient  for  the  present  discus¬ 
sion).  We  say  that  djt(n)  are  the  multiresolution  coefficients  at  scale  k  and  Cfc(n)  are  the  wavelet  coefficients 
at  scale  k. 

Assume  that  the  projection  coefficients  dk{n)  are  known  for  some  scale,  say  k  =  0.  We  will  then  show 
that  dk{n)  and  Ck{n)  for  the  coarser  scales,  i.e.,  k  =  -1,-2,...  can  be  generated  by  using  a  paraunitary 
analysis  filter  bank  {Ga{e^‘^),Ha{e^‘^)}  corresponding  to  the  synthesis  bank  {Gs(e-’"),  i?s(e^")}  (Sec.  4.4). 
We  know  ^(t)  and  ip{t)  satisfy  the  dilation  equations  (10.1).  By  substituting  the  dilation  equations  into  the 
righthand  sides  of  0fc„(t)  =  2*/20(2*^(  -  n)  and  ipkni*)  =  2*/V(2''t  -  n),  we  obtain 

OO 

<Pknit)  =  '^  Y  “  2n)4>k+i,m{t),  and  ipknit)  =  ^2  Y  "  2n)(l>k+i,m{t).  (10.15) 

m=:— OO  m=  — OO 

A  computation  of  the  inner  products  dk{n)  —  {x(t),(l>kn{t))  and  Ck{n)  —  {x[t),tpkn{t))  then  yields 

OO 

dk{n)  =  Y  t/25a(2n  -  m)4+i(7n), 

(10.16) 

OO 

Cfc(n)  =  Y  '/^ha{2n  -  m)dk+i{m). 

m=— OO 

where  ga{n)  =  gl{-n)  and  ha{n)  =  h*(-n)  are  the  analysis  filters  in  the  paraunitary  filter  bank. 

The  beauty  of  these  equations  is  that  they  look  like  discrete  time  convolutions!  Thus,  if  dk+iin)  is 
convolved  with  the  impulse  response  V2ga{n)  and  the  output  decimated  by  two,  the  result  is  the  sequence 
dfc(n).  A  similar  statement  follows  for  Ck(n).  The  above  computation  can  therefore  be  interpreted  in  filter 
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bank  form  as  in  Fig.  10.5.  Because  of  the  perfect  reconstruction  property  of  the  two  channel  system  (Fig. 
4.1),  it  follows  that  we  can  reconstruct  the  projection  coefficients  dk+i{n)  from  the  projection  coefficients 


dk{n)  and  Cfc(n). 


4+1  (n) 

Multiresolution 

coefficients  at 
level  k+1 


d,(n) 


Ck(n) 


Multiresolution 
coefficients  at 
level  k 

Wavelet  coefficients 
at  level  k 


Fig.  10.5.  Generating  the  wavelet  and  multiresolution  coefficients  at  level  k  from  level  k  +  1. 


Fig.  10.6.  Tree  structured  analysis  bank  generating  wavelet  coefficients 
Ck{n)  and  multiresolution  coefficients  dk{n)  recursively. 


The  Fast  Wavelet  Transform  (FWT) 

Repeated  application  of  this  idea  results  in  Fig.  10.6  which  is  a  tree  structured  paraunitary  filter 
bank  (Sec.  4.7)  with  analysis  filters  y/2gain)  and  y/2K(n)  at  each  stage.  Thus,  given  the  projection 
coefficients  dQ{n)  for  Fo,  we  can  compute  the  projection  coefficients  dkin)  and  Cfc(n)  for  the  coarser  spaces 
V'_2,1F_2,---  This  scheme  is  sometimes  refered  to  as  the  Fast  Wavelet  Transform  (FWT).  Fig. 
10.7  shows  a  schematic  of  the  computation.  In  this  figure,  each  node  (heavy  dot)  represents  a  decimated 
paraunitary  analysis  bank  {V2ga{n),  V2hain)).  The  subspaces  Wm  and  F„  are  indicated  in  the  nodes  rather 
than  the  projection  coefficients. 

Computation  of  the  initial  projection  coefficient.  Everything  depends  on  the  computation  of 
doin).  Note  that  do{n)  =  {x{t),4>{t  -  n)),  which  can  be  written  as  the  integral  do{n)  =  fx(t)(p*(t  -  n)dt. 
An  elaborate  computation  of  this  integral  is  avoided  in  practice.  If  the  scale  fe  =  0  is  fine  enough,  that  is, 
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if  x{t)  does  not  change  much  within  the  duration  where  is  significant,  we  can  approximate  this  integral 
with  the  sample  value  x{n).  That  is,  do(n)  «  x(n).  Improved  approximations  of  do(n)  have  been  suggested 
by  other  authors,  see  references  in  [Djokovic  and  Vaidyanathan,  1994]. 


W.i  W.2  W.3 

Fig.  10.7.  A  schematic  of  the  tree  structured  filter-bank  which  generates 
the  coefficients  of  the  projections  onto  14  and  ^^'k■ 

Continuous-time  Filter  Banks  and  Multiresolution 

The  preceding  discussions  show  the  deep  connection  between  orthonormal  multiresolution  ananlysis  and 
discrete  time  paraunitary  filter  banks.  As  shown  by  Eq.  (10.8c),  any  signal  x(t)  can  be  written  as  a  sum 
of  its  projections  onto  the  mutually  orthogonal  spaces  14,  Wo,  Mki  and  so  forth.  That  is, 

OO 

*(<)  =  E  do{n)4>it  —  n)  +  n)2'^>m2H  -  n). 

n  k—0  n 

This  decomposition  itself  can  be  given  a  simple  filter  bank  interpretation,  with  continuous-time  filters  and 
samplers.  For  this,  first  note  that  the  14  component  -  n)  can  be  regarded  as  the  output  of  a 

filter  with  impulse  response  <t>{t),  with  the  input  chosen  as  the  impulse  train  do{n)6a(t-n).  Similarly,  the 
Wk  component  Ck{n)2^/'^i){2^t  -  n)  is  the  output  of  a  filter  with  impulse  response  fk{t)  = 
in  response  to  the  input  Y.n^k{n)6a{t  -  2-^n).  This  interpretation  is  shown  by  the  synthesis  bank  of  Fig. 
10.8(a). 

The  projection  coefficients  do(n)  and  Ck{n)  can  also  be  interpreted  in  a  nice  way.  For  example,  we  have 
do{n)  =  {x{t),(p{t  -  n))  by  orthonormality.  This  inner  product  can  be  explicitly  written  out  as 

do{n)  =  j  x{t)<j>*{t  -  n)dt. 

The  integral  can  be  interpreted  as  a  convolution  of  x(t)  with  0*(-t).  Thus  consider  the  output  of  the  filter 
with  impulse  response  <(>*(-t),  with  the  input  chosen  as  x(t).  This  output,  sampled  at  time  ra,  gives  do{n). 
Similarly,  Cfc(n)  can  be  interpreted  as  the  output  of  the  filter  hk{t)  =  2*'/^^*(-2''t),  sampled  at  the  time 
2“'=n.  The  analysis  bank  of  Fig.  10.8(a)  shows  this  interpretation.  Thus,  the  projection  coefficients  do{n) 
and  Ck{n)  are  the  sampled  versions  of  the  outputs  of  an  analysis  filter  bank. 

Notice  that  all  the  filters  in  the  filter  bank  are  determined  by  the  scaling  function  0(t)  and  the  wavelet 
function  i){t).  Every  synthesis  filter  fk{t)  is  the  time  reversed  conjugate  of  the  corresponding  analysis  filter 
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hk{t),  that  is,  fk(t)  =  hl{-t)  (a  consequence  of  orthonormality).  In  terms  of  frequency  responses  this  means 
Fkiui)  =  For  completeness  of  the  picture.  Fig.  10.8(b)  shows  typical  frequency  response  magnitudes 

of  these  filters. 


Analysis  bank  Synthesis  bank 

(Multiresolution  analysis) 


10.4.  Further  Manifestations  of  Orthonormality 

The  orthonormality  of  the  basis  functions  -  n)}  and  {^'(t  -  n)}  have  further  consequences  which  we 
summarize  now.  A  knowledge  of  these  will  be  useful  when  we  generate  the  scaling  function  «i(t)  and  the 
wavelet  function  ip{t)  systematically  in  Sec.  11  from  paraunitary  filter  banks. 

The  Nyquist  Property  and  Orthonormality 

With  0(f)  e  L^,  the  autocorrelation  function  i?(r)  =  /  -  T)dt  exists  for  all  r  because  this  is  just 

an  inner  product  of  two  elements  in  Clearly  f?(0)  =  |l0(f)f  =  1.  Moreover  the  orthonormality  property 
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{(pit),  (p{t  -n))  =  6{n)  can  be  rewritten  as  R{n)  =  6{n).  Thus,  in  particular,  R{t)  has  periodic  zero  crossings, 
at  nonzero  integer  values  of  t  (Fig.  10.9).  This  is  precisely  the  Nyquist  property  familiar  to  communication 
engineers.  The  autocorrelation  of  the  scaling  function  <p{t)  is  a  Nyquist  function.  The  same  holds  for  the 
wavelet  function  V'(t). 


Fig.  10.9.  Example  of  an  autocorrelation  of  the  scaling  function  (p{t). 

Next,  using  Parseval’s  identity  for  T^-FTs,  we  obtain  {<p{t),<p{t  —  n))  =  f  $(u;)$*(a;)e-^‘^"dw/27r  =  (5(n). 
If  we  decompose  the  integral  into  a  sum  of  integrcds  over  intervals  of  length  27r  and  use  the  27r-periodicity  of 
we  obtain,  after  some  simplification: 

OO 

^  |#(a;  +  27rfc)p  =  1  almost  everywhere.  (10.17) 

/:=— OO 

This  is  the  preceding  Nyquist  condition,  now  expressed  in  the  frequency  domain.  The  term  almost  everywhere 
(Sec.  6.1  )  arises  from  the  fact  that  we  have  drawn  a  conclusion  about  an  integrand  from  the  value  of  the 
integral.  Thus  {(p{t  —  n)}  is  orthonormal  if  and  only  if  the  preceding  equation  holds.  A  similar  result  follows 
for  $(u^),  that  is  orthonormality  of  {tp{t  —  n)}  is  equivalent  to 

OO 

yP  |^(w  +  27rA:)p  =  1,  almost  everywhere.  (10.18) 

/;=  — OO 

Case  When  Equalities  Hold  Pointwise 

If  we  assume  that  all  Fourier  transforms  are  continuous,  then  equalities  in  the  Fourier  domain  actually 
hold  pointwise.  This  is  the  most  common  situation;  in  all  examples  to  be  seen  here,  the  following  are  true:  (i) 
the  filters  Gs(e^'“)  and  Hs{e^'^)  are  rational  (FIR  or  HR)  so  the  frequency  responses  are  continuous  functions 
of  w,  and  (ii)  (p{t)  and  ip{t)  are  not  only  in  li^  but  also  in  i.e.,  <p{t),'(p[t)  €  flL^.  Thus,  $(w)  and  ^'(u;) 
are  continuous  functions  (Sec.  6.3). 

With  the  dilation  equation  $(u;)  =  Gs(e^'‘^/^)$(u;/2)  holding  pointwise,  we  have  $(0)  =  Gs(e-’°)#(0). 
In  all  our  applications  $(0)  ^  0  (it  is  a  lowpass  filter),  so  Gs{e^°)  =  1-  The  power  symmetry  property 

\Gsienf  +  \Gs{-en?  =  1 
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then  implies  Gs{e^^)  =  0.  Since  the  highpass  synthesis  filter  is  =  eJ‘^G*(-e-'")  we  conclude  Hs(eJ°)  - 

0  and  Hs{e^^)  =  -1-  Thus 

G,(e>°)  =  1,  G.(e^-)  =  0,  =  0,  =  -1.  (10.19) 

In  particular,  the  lowpass  impulse  response  gs{n)  therefore  satisfies  Y,n9sin)  =  1-  Since  we  already  have 
E„  =  0.5  (Theorem  10.1),  we  have  both  of  the  following: 

oo  oo 

53  5.(n)  =  l,  and  53  |5.(n)P  =  0.5.  (10.20) 

n=  — OO  rt=— oo 

From  the  dilation  equation  $(a;)  =  G,(el“/2)$(^/2)  we  obtain  #(27rfc)  =  G,{el"'‘-)#(7rfc).  By  using  the  fact 
that  Gs(el’")  =  0,  and  after  elementary  manipulations  we  can  show  that 


$(27rfc)  =  0,  0. 


(10.21) 


That  is,  #(a;)  is  itself  a  Nyquist  function  of  w.  If  (10.17)  is  assumed  to  hold  pointwise,  then  the  above  implies 
that  |#(0)1  =  1.  Without  loss  of  generality  we  will  let  $(0)  =  1  i.e.,  /  4>{t)dt  =  1.  The  dilation  equation  for 
the  wavelet  function  ^'(w)  in  (10.2)  shows  that  $(0)  =  0  [since  Hs{e^^)  =  0  by  (10.19)].  That  is,  J  ■ijj{t)dt  =  0. 
Summarizing,  the  scaling  and  wavelet  functions  satisfy 


/OO  poo 

\<t,{t)\‘^dt=  /  \iJ{t)\^dt  =  l, 
-oo  J-oo 


(10.22) 


where  the  third  property  follows  from  orthonormality.  These  integrals  make  sense  because  of  the  assumption 
4>{t)  e  n  Another  result  that  follows  from  $(27rfc)  =  6{k)  is  that 

OO 

^  <i>{t-n)  =  l  a.e.  (10.23) 

n=  — OO 

Thus  the  basis  functions  of  the  subspace  Vq  themselves  add  up  to  unity.  Return  to  the  Haar  basis,  and  notice 
how  beautifully  everything  fits  together! 


10.5.  Generating  wavelet  and  multiresolution  basis  by  design  of  (f){t) 

Most  of  the  well-known  wavelet  basis  families  in  recent  times  have  been  generated  by  first  finding  a  scaling 
function  <f>{t)  such  that  it  is  a  valid  generator  of  multiresolution,  and  then  generating  i^{t)  from  The 

first  step  therefore  is  to  identify  the  conditions  under  which  a  function  0(f)  will  be  a  valid  scaling  function 
(i.e.,  it  will  generate  a  multiresolution).  Once  this  is  done,  and  we  successfully  identify  the  coefficients  gs{n) 
in  the  dilation  equation  for  0(f),  then  we  can  identify  the  wavelet  function  0(f)  using  the  second  dilation 
equation  in  (10.1).  From  Theorem  10.1  we  know  that  if  0(f)  is  computed  in  this  way,  then  {2*''/20(2'‘f  -  n)} 
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is  an  orthonormal  wavelet  bcisis  for  L^.  The  following  results  can  be  deduced  from  the  many  detailed  results 
presented  in  [Daubechies,  1992]. 

Theorem  10.2.  Let  ip{t)  satisfy  the  following  three  conditions:  (a)  0(t)  €  H  ,  (b)  f  <f>{t)dt  0 
(i.e.,  $(0)  0  ),  (c)  =  2S„5,(n)^(2t  -  n)  for  some  {^.(n)},  and  (d)  {0(t  -  n)}  is  an  orthonormal 

sequence.  Then  the  following  are  true. 

1.  <^(t)  generates  a  multiresolution.  That  is,  if  we  define  the  space  Vk  to  he  the  closure  of  the  span  of 

-  n)},  then  the  set  of  spaces  {14}  satisfies  the  six  conditions  in  Definition  10.1. 

2.  Define  rp{t)  =  2 (-a  -  l)<?!>(2t  -  n).  Then  ip{t)  generates  an  orthonormal  wavelet  basis  for 

L^.  That  is,  {2'‘/^ip{2'‘t-n)},  with  k  and  n  varying  over  all  integers,  is  an  orthonormal  basis  for  .  In 
fact,  for  fixed  k,  the  functions  {2'‘^^-il;{2’‘t  -  n)}  form  an  orthonormal  basis  for  the  subspace  Wk  defined 
in  Sec.  10.1.  ^ 

Comments.  In  many  examples  €  L^,  and  it  is  compactly  supported.  Then  it  is  naturally  in  as 
well  (Eq.  (6.1)),  so  the  assumption  n  is  not  too  restrictive.  Since  n  is  dense  in  the 

above  construction  still  gives  a  wavelet  basis  for  L^.  Notice  also  that  the  orthonormality  of  {(j>{t  —  n)}  implies 
orthonormality  of  {v^^(2t  -  n)}.  So  the  recursion  «i(t)  =  2  9s{n)(i>i^t  -  n)  is  a  Fourier  series  for  in 

L^.  Thus  the  condition  |5s(n)P  =  0.5  is  automatically  implied.  This,  therefore,  is  not  explicitly  stated 
as  part  of  the  conditions  in  the  theorem. 

Orthnormalization 

From  Sec.  10.4  we  know  that  orthonormality  of  —  n)}  is  equivalent  to 

OO 

|$(w  +  27rfc)p=  1.  (10.24) 

fc=— OO 

Suppose  now  that  this  is  not  satisfied  but  the  weaker  condition 

OO 

a<  Y  l^(w+27rA:)p  <  6  (10.25) 

=  — OO 

holds  for  some  a  >  0  and  b  <  oo.  Then  it  can  be  shown  that  we  can  at  least  obtain  a  Riesz  basis  (Sec.  7.3) 
of  the  form  {(j){t  -  n)}  for  Vq.  We  can  also  normalize  it  to  obtain  an  orthonormal  sequence  {4>{t  -  n)}  from 
which  an  orthonormal  wavelet  basis  can  be  generated  in  the  usual  way.  The  following  theorem  summarizes 
the  main  results. 

Theorem  10.3.  Let  (a)  (f>{t)  e  (b)  J  <j>{t)dt  ^  0  {le.,  #(0)  ^  0),  and  (c)  0(t)  =  2  5s(n)c^(2f- 

n)  with  bs(^)P  <  OO-  Instead  of  the  orthonormality  condition  (10.24),  let  (10.25)  hold  for  some  a  >  0 
and  6  <  oo.  Then  the  following  are  true. 


1.  -  n)}  is  a  Riesz  basis  for  the  closure  Vo  of  its  span. 

2.  generates  a  multiresolution.  That  is,  if  we  define  the  space  V*  to  be  the  closure  of  the  span  of 

-  n)},  then  the  set  of  spaces  {Vfc}  satisfies  the  six  conditions  in  Definition  10.1.  0 

Orthonormalization.  If  we  define  a  new  function  <f>{t)  in  terms  of  its  Fourier  transform  as  follows. 

«(-)  =  7 - - nrr  (“-26) 

(E»W“  +  2«i)P) 


then  generates  an  orthonormal  multiresolution.  It  satisfies  a  dilation  equation  similar  to  (10.1).  From 
this  we  can  define  a  corresponding  wavelet  function  in  the  usual  way.  That  is,  if  =  2  gs{n)(p{2t  — 
n),  then  choose  ^  =  2^„K{n)li2t  -  n),  where  K{n)  =  (-l)”+'5;(-^  “  !)■  This  wavelet  ^{t)  then 
generates  an  orthonormal  wavelet  basis  for  L^.  Note  that  the  basis  is  not  necessarily  compactly  supported 
if  we  start  with  compactly  supported  <p(t).  An  example  will  be  seen  in  Fig.  13.2(b)  later. 

Example  10.2.  Battle-Lemarie  orthonormal  wavelets  from  splines.  In  Example  10.1  we 
considered  a  triangular  0(t)  (Fig.  10.4)  which  generates  a  nonorthonormal  multiresolution.  In  this  example 


we  have 


(10.27) 


and  it  can  be  shown  that 

f;  |#(a;  +  2^fc)|2  =  ?±pi^  (10.28) 

fc=— CC 

The  inequality  (10.25)  is  satisfied  with  a  =  1/2  and  b  =  3/2.  Thus  we  have  a  Riesz  basis  {<p{t  -  n)}  for 
Vq.  From  this  scaling  function  we  can  obtain  the  normalized  function  $(a;)  as  above  and  then  generate  the 
wavelet  function  t?(t)  as  explained  above.  This  gives  an  orthonormal  wavelet  basis  for  But  0(t)  does  not 
have  compact  support  (unlike  4>{t)).  That  is  the  wavelet  function  ^(t)  generating  the  orthonormal  wavelet 
basis  is  not  compactly  supported  either.  ^ 


11.  ORTHONORMAL  WAVELET  BASIS  FROM  PARAUNITARY  FILTER  BANKS 

The  wisdom  gained  from  the  multiresolution  viewpoint  (Sec.  10)  tells  us  there  is  a  close  connection  between 
wavelet  bases  and  two  channel  digital  filter  banks.  In  fact  we  obtained  the  equations  of  a  paraunitary  filter 
bank  just  by  imposing  the  orthonormality  condition  on  the  multiresolution  basis  functions  {4>{t  —  n)}.  In 
this  section  we  will  present  the  complete  story:  suppose  we  start  from  a  two  channel  digital  filter  bank  with 
the  paraunitary  property.  Can  we  derive  an  orthonormal  wavelet  basis  from  this?  To  be  more  specific, 
return  to  the  dilation  equations  (10.1)  or  equivalently  (10.2).  Here  gs{n)  and  hs{n)  are  the  impulse  response 
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coefficients  of  the  two  synthesis  filters  Gs(e^“)  and  in  the  digital  filter  bank.  Given  these  two  filters, 

can  we  “solve”  for  (j){t)  and  V'(i)?  If  so,  does  this  generate  an  orthonormal  basis  for  L"^  space?  In  this 
section  we  will  answer  some  of  these  questions.  Unlike  in  any  other  section,  we  will  also  indicate  a  sketch  of 
the  proof  for  each  major  result,  in  view  of  the  importance  of  these  in  modern  signal  processing  theory. 

Recall  first  that  under  some  mild  conditions  (Sec.  10.4)  we  can  prove  that  the  filters  have  to  satisfy 
(10.19), (10. 20),  if  we  need  to  generate  wavelet  and  multiresolution  bases  successfully.  We  will  impose  these 
at  the  outset.  By  repeated  application  of  the  dilation  equation  we  get  #(a;)  =  G,(ef“'/2)G,(ef“/4)$((^/4). 
Further  indefinite  repetition  yields  an  infinite  product.  Using  the  condition  $(0)  =  1  which  we  justified  at 
the  end  of  Sec.  10.4,  we  obtain  the  infinite  products 


OO  CO 


<;=1  k=2 

(11.1a) 

OO 

$(a>)  =  H,(e^“'/2)  n  G,(e^“/2  y 

(11.16) 

k=2 


The  first  issue  to  be  addressed  is  the  convergence  of  the  infinite  products  above.  For  this  we  need  to  review 
some  preliminaries  on  infinite  products  [Apostol,  1974],  [Rudin,  1966). 

The  ideal  bandpass  wavelet  re-derived  from  the  digital  filter  bank.  Before  we  address  the 
mathematical  details,  let  us  consider  a  simple  example.  Suppose  the  pair  of  filters  Gs{e^‘^)  and  are 

ideal  brickwall  lowpass  and  highpass  filters  as  in  Fig.  4.8(a).  Then  we  can  verify,  by  making  simple  sketches 
of  a  few  terms  in  (11.1),  that  the  above  infinite  products  yield  the  functions  <&(a;)  and  ’^(w)  shown  in  Fig. 

10.1.  That  is,  the  ideal  bandpeiss  wavelet  is  indeed  related  to  the  ideal  paraunitary  filter  bank  by  means  of 
the  above  infinite  product! 

11.1.  Convergence  of  Infinite  Products 

To  define  convergence  of  a  product  of  the  form  consider  the  sequence  {pn}  of  partial  products 

p.  -  uu  If  this  converges  to  a  (complex)  number  A  with  0  <  |A|  <  oo  we  say  that  the  infinite  product 
converges  to  A.  Convergence  to  zero  should  be  defined  more  carefully  to  avoid  degenerate  situations  (e.g.,  if 
tti  =  0,  then  Pn  =  0  for  all  n  regardless  of  the  remaining  terms  a*,  k  >  1).  We  use  the  definition  in  [Apostol, 
1974].  The  infinite  product  is  said  to  converge  to  zero  if  and  only  if  a*,  =  0  for  a  finite  nonzero  number  of 
values  of  fc,  and  if  the  product  with  these  ajt’s  deleted  converges  to  a  nonzero  value. 

Useful  Facts  About  Infinite  Products 

1.  Whenever  Hfei  “fc  converges,  it  can  be  shown  that  a*  — >  1  as  k  — >  oo.  For  this  reason  it  is  convenient 
to  write  a*  =  1  +  bk- 


2.  We  say  that  (l  +  converges  absolutely  if  H^i  +  l^t-1)  converges.  Absolute  convergence  of 

n^i  (l  +  implies  its  convergence. 

3.  It  can  be  shown  that  the  product  (l  +  l^fc|)  converges  if  and  only  if  the  sum  YlT=i  1^*1  converges. 
That  is,  n^i  (1  +  ^fc)  converges  absolutely  if  and  only  if  Y^kLi  converges  absolutely. 

Examples.  The  product  n^i(l  +  converges  because  Y,T=i  converges.  Similarly  n^i(l  “  ^ 
converges  because  it  converges  absolutely,  by  the  preceding  example.  The  product  n^i(l  +  <loes  not 
converge  because  Y.T=i  diverges.  Products  such  as  11^=1  do  not  converge  because  the  terms  do 

not  approach  unity  as  k  oo.  ^ 

Uniform  Convergence 

A  sequence  of  functions  of  the  complex  variable  2  converges  uniformly  to  a  function  p{z)  on  a 

set  S  in  the  complex  plane  if  the  convergence  rate  is  the  same  everywhere  in  S.  More  precisely,  if  we  are 
given  e  >  0,  we  can  find  N  such  that  \p„iz)-p(z)\  <  e  for  every  z  S  >S,  as  long  asn>N.  The  crucial  thing 
is  that  N  depends  only  on  e  and  not  on  2,  as  long  as  2  €  <S.  A  similar  definition  applies  for  functions  of  real 
variables. 

We  say  that  an  infinite  product  of  functions  O^i  converges  at  a  point  2  if  the  sequence  of  partial 
products  Pn{~)  —  nfc=i  ®fc(-2)  converges  as  described  previously.  If  this  convergence  of  pn{z)  is  uniform  in  a 
set  S  we  say  that  the  infinite  product  converges  uniformly  on  S.  Uniform  convergence  has  similar  advantages 
as  in  the  case  of  infinite  summations.  For  example,  if  each  of  the  functions  afc(a;)  is  continuous  on  the  real 
interval  then  uniform  convergence  of  the  infinite  product  A{uj)  =  nit=i®*(‘^)  [‘^i)‘^2]  implies 

that  the  limit  A{u)  is  continuous  on  [wi,u;2]-  We  saw  above  that  convergence  of  infinite  products  can  be 
related  to  that  of  infinite  summations.  The  following  theorem  [Rudin,  1966]  makes  the  connection  between 
uniform  convergence  of  summations  and  uniform  convergence  of  products. 

Theorem  11.1.  Let  hk{z)^k  >  1  be  a  sequence  of  bounded  functions  of  the  complex  variable  2,  such 
that  converges  uniformly  on  a  compact  sett  S  in  the  complex  2  plane.  Then  the  infinite 

product  n^i  (1  +  ^k{z))  converges  uniformly  on  S.  Moreover,  this  product  is  zero  for  some  zq  if  and  only 
if  1  +  bk{zo)  =  0  for  some  k.  ^ 

Uniform  convergence  and  analyticity.  We  know  that  if  a  sequence  of  continuous  functions  converges 
uniformly  to  a  function,  then  the  limit  is  also  continuous.  A  similar  result  is  true  for  analytic  functions.  That 
is,  if  a  sequence  {/n(s)}  of  analytic  functions  converges  uniformly  to  a  function  f{s)  then  f{s)  is  analytic  as 

^  For  us,  a  compact  set  means  any  closed  bounded  set  in  the  complex  plane  or  on  the  real  line.  Examples 
(i)  all  points  on  and  inside  a  circle  in  the  complex  plane,  and  (ii)  the  closed  interval  [a,  6]  on  the  real  line. 
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well.  For  a  more  precise  statement  of  this  result  see  Theorem  10.28  in  Rudin  [1966]. 

11.2.  Infinite  Product  Defining  the  Scaling  Function 

Return  now  to  the  infinite  product  (11.1a).  As  justified  in  Sec.  10.4,  we  assume  Gs{e^‘^)  to  be  continuous, 
=  1,  and  $(0)  ^  0.  Note  that  Gs{e^°)  =  1  is  necessary  for  the  infinite  product  to  converge  (because 
convergence  of  Ht  “fc  implies  that  a*  1;  apply  this  for  w  =  0).  The  following  convergence  result  is 
fundamental. 

Theorem  11.2.  Convergence  of  the  infinite  product.  Let  Gs(e^“)  =  Z)^-oo  Assume 

that  Gs(e-i“)  =  1,  and  |n3s(n)|  <  oo.  Then 

1.  The  infinite  product  (11.1a)  converges  pointwise  for  all  w.  In  fact  it  converges  absolutely  for  all  w,  and 
uniformly  on  compact  sets  (i.e.,  closed  bounded  sets,  e.g.,  sets  of  the  form 

2.  The  quantity  Gs(e-i“),  as  well  as  the  limit  $(a;)  of  the  infinite  product  (11.1a)  are  continuous  functions 
of  (jJ. 

3.  Gs{ei^)  is  in  <> 

Since  the  condition  X^„|^5s(w)l  <  oo  implies  filter  Gs(e'’“)  is  restricted  to  be 

stable.  But  the  above  result  holds  whether  gs{n)  is  FIR  or  HR. 

Sketch  of  proof.  Theorem  11.1  allows  us  to  reduce  the  convergence  of  the  product  to  the  convergence 
of  an  infinite  sum.  For  this  we  have  to  write  Gs(e-i“)  in  the  form  1  -  F(e^‘^)  and  then  consider  the  sum¬ 
mation  |F(e^“/2‘‘)|.  Since  Gs(e^“)  =  1  =  EnS'«(”)’  ’^”te  Gs(e-’")  =  1  -  (l  -  G^(e-'"))  = 

1  -  But  |Enff«W(l  -  ^  2E„l5s(«)sin(u;n/2)|  <  |t^|E„N^WI  (^se 

|sina:/r|  <  1).  Since  En  assumed  to  converge,  we  have  lEn5s(«)(l  “  e“-'“”)l  <  c|a;|.  Using 

this,  and  the  fact  that  E^i  2“''  converges,  we  can  complete  the  proof  of  part  1  (apply  Theorem  11.1). 
Since  E„  l«ffs(«)l  <  0°  implies  in  particular  that  gs(n)  €  its  f^FT  Gs(e^‘^)  is  continuous  (Sec.  6.3).  The 
continuity  of  Gs{c^‘^)  together  with  uniform  convergence  of  the  infinite  product  implies  that  the  pointwise 
limit  $(w)  is  also  continuous.  Finally,  since  F  C  (Sec.  6.2),  we  have  gs{n)  S  that  is,  Gs{c^^)  €  L^[0, 27r] 
as  well.  V  V  V 

11.3.  Orthonormal  Wavelet  Basis  From  Paraunltary  Filter  Bank 

We  now  consider  the  behavior  of  the  infinite  product  H^i  Gs(e-i“/2‘‘)  when  Gs(e^“)  comes  from  a  parau¬ 
nitary  filter  bank.  The  paraunitary  property  implies  that  Gs(e^'")  is  power  symmetric.  If  we  impose  some 
further  mild  conditions  on  Gs(e^'“)  then  the  scaling  function  generates  an  orthonormal  multiresolution 
basis  {<j){t  -  n)}.  We  can  then  obtain  an  orthonormal  wavelet  basis  {ti’fcn(t)}  (Theorems  10.1,  10.2).  The 
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main  results  will  be  given  in  theorems  11.3-11.6  and  12.1. 

First  let  us  define  the  truncated  partial  products  P„(a;).  Since  Gsie^‘^)  has  period  2Tr,  the  term  Gs(e^  >  ) 
has  period  2*+'s.  For  this  reason  the  partiai  product  nLl bas  period  2”+>s,  and  we  can  regard 
the  region  [-2"^,  2"^]  to  be  the  fundamental  period.  Let  us  truncate  the  partial  product  to  this  region,  and 


define 


Pn(a^) 


IlLi  ),  for  -2"7r  <  u;  <  2"7r 

0  otherwise. 


(11.2) 


This  quantity  will  be  useful  later.  We  will  see  that  this  is  in  1?{R),  and  we  can  talk  about  p„(t),  its  inverse 


T^-FT. 

Theorem  11.3.  Let  Gs{e^^)  be  as  in  Theorem  11.2.  In  addition  let  it  be  power  symmetric,  that  is 

(Notice  in  particular  that  this  implies  Gs{e-^'^)  =  0,  since  Gsie^°)  =  !)■  Then 


the  following  are  true. 

1.  \G,{en\^du/27r  =  0.5. 

2.  The  truncated  partial  product  P„(u,)  is  in  and  /_~  |P„(a;)pd^/2^  =  1  for  all  n.  Morover  the  inverse 
T^-FT,  denoted  as  p„(t),  gives  rise  to  an  orthonormal sequence  {pn{t-k)},  that  is  {pnit-k),pn{t-t))  = 


6{k  -  i)  for  any  n  >  1. 

3.  The  limit  ^oj)  of  the  infinite  product  (11.1a)  is  in  L^,  hence  it  has  an  inverse  T^.ft,  4>{t)  6  L^. 
Moreover  ||<A(t)|l2  <1-  ^ 

Sketch  of  proof.  Part  1  follows  by  integrating  both  sides  of  \Gs{e^‘^)?  +  =  1-  The  inte- 

gral  in  part  2  is  RLi  \Gs{e^^^^')?dw/2r:  which  we  can  split  into  two  terms  like  J,  +  • 

Using  the  27r-periodicity  and  the  power  symmetric  property  of  Gs(e^’“),  we  obtain  /  \Pn?duj  =  /  |P„-i  |  du. 
Repeated  application  of  this,  used  together  with  part  1,  yields  lP„(w)pda;/27r  =  1.  The  proof  of 
orthonormality  of  {p„(t  -  k)}  follows  essentially  in  a  similar  way  by  working  with  the  modified  integral 
|P„(a;)pe2“(''“‘)du;/27r,  and  using  the  half-band  property  of  |Gs(e^‘^)p. 

The  third  part  is  the  most  subtle  one,  and  uses  Fatou’s  Lemma  for  Lebesgue  integrals  (Sec.  6.1).  For 
this,  define  3„(a;)  =  |Pn(w)p.  Then  {5„(w)}  is  a  sequence  of  nonnegative  integrate  functions  such  that 
_  |$(a,)p  pointwise  for  each  u;.  Moreover,  since  /  <;„(a;)dc.  =  27r  (from  part  2),  Fatou’s  lemma  assures 
us  that  |$(w)p  is  integrable  with  integral  <  27r.  This  proves  part  3.  VV  V 


It  is  most  interesting  that  the  truncated  partial  products  P„(a;)  give  rise  to  orthonormal  sequences 
{p„(t-  fc)}.  This  orthonormality  is  induced  by  the  paraunitary  property,  more  precisely  the  power  symmetry 
property  of  Gs{e^‘^)-  This  is  consistent  with  the  fact  that  the  filter-bank  type  of  basis  introduced  in  Sec. 
4.8  for  the  discrete  time  functions  x{n)  €  f  is  an  orthonormal  basis  for  whenever  the  filter  bank  is 
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paraunitary. 

Since  the  scaling  function  $(w)  is  the  pointwise  limit  of  {P„(a;)}  as  n  — »  cxo,  this  leads  to  the  hope  that 
-  fc)}  is  also  an  orthonormal  sequence  (so  that  we  can  generate  a  multiresolution  and  then  a  wavelet 
basis  as  in  Theorems  10.1,10.2).  This  however  is  not  always  true! 

The  crux  of  the  reason  is  that  $(w)  is  only  the  pointwise  limit  of  {P„(u;)},  and  not  necessarily  the  L 
limit!  The  distinction  is  subtle  (see  below).  The  pointwise  limit  property  means  that,  for  any  fixed  w,  the 
function  Pn(w)  approaches  $(w).  The  limit  property  means  that  J  |Pn(w)  -  $(a;)pdw  ->  0.  Neither  of 
these  limit  properties  implies  the  other,  that  is  neither  is  stronger  than  the  other.  It  can  be  shown  that  it  is 
the  limit  which  propagates  the  orthonormality  property,  and  this  is  what  we  want. 

Theorem  11.4.  Let  {pn(t  -  fc)}  be  an  orthonormal  sequence  for  each  n,  that  is,  (p„(t  -  k),pn{t  -  i))  = 
6{k  -  i).  Suppose  p„(t)  -»  <p{t)  in  the  sense.  Then  {^{t  -  k)}  is  an  orthonormal  sequence.  ❖ 

Proof.  If  we  take  limits  as  n  — >  oo,  we  can  write 

Ul^(p„(t  -  k),pnit  -  i))  =  “  0) 

This  movement  of  the  “limit”  sign  past  the  inner  product  sign  is  allowed  (by  continuity  of  inner  products. 
Sec.  6.2)  provided  the  limits  in  the  second  expression  are  limits.  By  the  conditions  of  the  theorem,  the 
left  side  of  the  above  equation  is  S{k  -  i)  whereas  the  right  side  is  {<j>(t  -  k),  4>{t  -  i)).  So  the  result  follows. 

V  VV 

P^  Convergence  Versus  Pointwise  Convergence. 

The  fact  that  P^  limits  are  not  necessarily  pointwise  limits  is  obvious  from  the  fact  that  differences  at 
a  countable  set  of  points  do  not  affect  integrals.  The  fact  that  pointwise  limits  are  not  necessarily  P^  limits 
is  demonstrated  by  the  sequence  of  P^  functions  {/n(t)}?  with  fn{t)  as  in  Fig.  11.1. 


Fig.  11.1.  A  sequence  {/n(t)}  whose  pointwise  limit  is  not  a  limit  in  the  P^-sense. 

Note  that  /„(t)  0  pointwise  for  each  t,  that  is,  the  pointwise  limit  is  f{t)  =  0.  So  |l/„(t)  -  /(t)||  = 

ll/„(f)||  =  1  for  all  n,  hence  l|/„(0  -  /(t)||  does  not  go  to  zero  as  n  oo.  Thus  f{t)  is  not  the  P^  limit 
of  /n(0-  Notice  in  this  example  that  1  =  lim„^oo  /  l/n(i)P^*  /lini„.^oo  \fn{t)\‘^dt  =  0.  This  is  consistent 
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with  the  fact  that  Lebesgue  dominated  convergence  theorem  cannot  be  applied  here  there  is  no  integrable 
function  that  dominates  |/n(t)i^  for  all  n.  In  this  example,  the  sequence  {/„(*)}  does  not  converge  in  the 
sense  (in  fact  ||/„(t)  -  fn,{tW  =  2  for  n  ^  m.  So  {/„}  is  not  a  Cauchy  sequence  in  X^). 

Some  facts  pertaining  to  pointwise  and  X^  convergences:  It  can  be  shown  that  if  /„(t)  /(f)  m  X^ 
sense  and  /n(f)  — ♦  g{t)  £  X^  pointwise  as  well,  then  /(f)  =  g{t)  a.e.  In  particular  ||/(f)  -  3(f)||  =  0  and 
||/(t)||  =  ||5'(f)||.  It  can  also  be  shown  that  if  /n(f)  — *  /(f)  in  sense,  then  ||/n(f)||  ||/(f)ll-  Finally  if 

/n(f)  — ^  /(f)  €  X^  pointwise  a.e.,  and  ||/n(f)||  — >  ||/(f)||  then  /n(f)  /(f)  in  sense  as  well  [Rudin,  1966]. 

Theorem  11.5.  Orthonormal  wavelet  b«isis.  Let  the  filter  Gs(e^“)  =  I3^_oo  S's(n)e  satisfy 
the  following  properties. 

1.  Gs{e^^)  =  1, 

2- 

3.  |Gs(e-i“)p  +  |Gs(-e^'“)P  =  1  (power-symmetry),  and 

4.  Gs{e^'^)  ^  0  for  w  €  [-0.57r,0.57r]. 

Then  the  infinite  product  (11.1a)  converges  to  a  limit  #(a;)  £  X^,  and  its  inverse  FT  <t>{t)  is  such  that 
{^•(f  -  n)}  is  an  orthonormal  sequence.  Defining  the  wavelet  function  ^(f)  as  usual,  i.e.,  as  in  (10.10),  the 
sequence  {2*-‘/^V'(2^'f  -  «)}  (with  k  and  n  varying  over  all  integers)  forms  an  orthonormal  wavelet  basis  for 

P.  ^ 

Sketch  of  proof.  We  wiU  show  that  the  sequence  {P„(w)}  of  partial  products  converges  to  $(a;)  in  the  X^ 
sense,  that  is  /  |P„(w)  -  #(w)p(iw  ^  0,  so  thatp„(f)  d-(f)  in  X^  sense.  The  desired  result  then  follows  in 
view  of  Theorems  11.3  and  11.4.  The  key  tool  in  the  proof  is  the  dominated  convergence  theorem  for  Lebesgue 
integrals  (Sec.  6.1).  First,  the  condition  G(e-i‘^)  #  0  in  [-O.dx,  O.Stt]  implies  that  $(u;)  7^  0  in  [-7r,7r]. 
Since  |$(a;)P  is  continuous  (Theorem  11.2)  it  has  a  minimum  value  c^  >  0  in  [-7r,7r|.  Now  the  truncated 
partial  product  P„{lo)  can  always  be  written  as  P„(w)  =  $(w)/#(w/2")  m  its  region  of  support.  Since 
|#(w/2")p  >  in  [-2"7r,2"7r],  we  have  |P„(w)P  <  |#(w)p/c2  for  all  w.  Define  Q„(u;)  =  |Pn(w)  -  $(w)p. 
Then  using  |P„(u;)p  <  |#(w)p/c^  we  can  show  that  (3„(w)  <  a|$(w)p  for  some  constant  a.  Since  the 
right  hand  side  is  integrable,  and  since  (5„(w)  0  pointwise  (Theorem  11.2)  we  can  use  the  dominated 

convergence  theorem  (Sec.  6.1)  to  conclude  that  lim„  /<5n(w)  doj  =  /lim„  Qni'^)  doj  =  0.  This  completes 
the  proof.  V  VV 


Computing  the  Scaling  and  Wavelet  Functions 

Given  the  coefficients  g^in)  of  the  filter  Gs(e-’“),  how  do  we  compute  the  scaling  function  0(f)  and  the 
wavelet  function  0(f)?  Since  we  can  compute  0(f)  using  0(f)  =  2  Er=-oo(”  l)‘?^(2^-«')’  ^he  key 
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issue  is  the  computation  of  4>{t).  In  the  preceding  theorems,  4>{t)  was  defined  only  as  an  inverse  FT  of  the 
infinite  product  $(u;)  given  in  (11.1a).  Since  an  function  is  determined  only  in  the  a.e.  sense,  this  way  of 
defining  ^(f)  itself  does  not  fully  determine  Recall  however,  that  the  infinite  product  for  $(w)  was  only  a 

consequence  of  the  more  fundamental  equation,  namely  the  dilation  equation  (f>{t)  =  2  9s{n)<i){‘it—n). 

In  practice,  (j>(t)  is  computed  using  this  equation  (which  is  often  a  finite  sum,  see  Sec.  12).  The  procedure 
is  recursive,  that  is  we  assume  an  initial  solution  for  the  function  substitute  it  into  the  right  hand 
side  of  the  dilation  equation,  thereby  recompute  ^(f),  and  then  repeat  the  process.  Details  of  this,  and 
discussions  on  convergence  of  this  procedure  can  be  found  in  [Daubechies,  1992],  [Chui,  1992a],  [Daubechies 
and  Lagarias,  1991],  and  [Rioul,  1991]. 

Eigenfunction  Condition  for  Orthonormality  [Lawton,  1990] 

Recall  that  Eq.  (10.17)  is  equivalent  to  the  orthonormality  of  {0(t  -  n)}.  Let  denote  the  left 

hand  side  of  (10.17),  which  evidently  has  period  2;r  in  u.  Using  the  frequency  domain  version  of  the  dilation 
equation  (10.2),  it  can  be  shown  that  the  scaling  function  0(f)  generated  from  Gs{e^‘^)  is  such  that 

\Gs{en?S{e^^)  =  0.55(e^'“)  (11-4) 

where  the  notation  J,  2  indicates  decimation  (Sec.  4.1).  Thus  the  function  5(e-’‘^)  can  be  regarded  as  an 
eigenfunction  (with  eigenvalue  =  0.5)  of  the  operator  which  performs  filtering  by  |Gs(e^‘^)|2  followed  by 
decimation. 

Now  consider  the  case  where  the  digital  filter  bank  is  paraunitary,  so  that  Gs{c^  )  is  power  symmetric, 
that  is  satisfies  (10.12).  The  power  symmetric  condition  can  be  rewritten  in  the  form  |Gs(e'’‘^)P  =  0.5. 

Thus  in  the  power  symmetric  case,  the  identity  function  is  an  eigenfunction  of  the  operator  !F.  If  the  only 
eigenfunction  of  the  operator  !F  is  the  identity  function,  it  then  follows  that  =  1,  that  is  (10.17)  holds 

and  {0(f  -  n)}  is  orthonorormal. 

The  FIR  case.  In  Sec.  12  we  will  see  that  restricting  Gsiz)  to  be  FIR  ensures  that  0(f)  has  finite 
duration.  For  the  FIR  case,  Lawton  and  Cohen  independently  showed  that  the  above  eigenfunction  condition 
also  works  in  the  other  direction.  That  is,  if  {0(f  ~n)}  has  to  be  orthonorormal,  then  the  trignometric  poly¬ 
nomial  5(eJ“)  satisfying  (11.4)  has  to  be  unique  up  to  a  scale  factor.^  Details  can  be  found  in  [Daubechies, 
1992]. 

t  A  finite  sum  of  the  form  is  said  to  be  a  trignometric  polynomial.  If  Gs(eJ“)  is  FIR,  it  can 

be  show  that  the  left  hand  side  of  (10.17)  is  not  only  periodic  in  w,  but  is  in  fact  a  trignometric  polynomial. 
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Examples  and  Counter  Examples 

We  already  indicated  after  the  introduction  of  Eq.  (11.1),  that  the  example  of  the  ideal  bandpass  wavelet 
can  be  generated  formally  by  starting  from  the  ideal  brickwaU  paraunitary  filter  bank.  We  now  discuss  some 


other  examples. 

Example  11.1.  Haar  basis  from  filter  banks.  A  filter  bank  of  the  form  Fig.  4.1(a)  with  filters 

-  1  ^  .  V  1+2"^  -  .  .  1- 


Ga(z)  =  Ha{z)  =  - 


Gsiz)  = 


Hs{z)  = 


is  paraunitary.  The  magnitude  responses  of  the  synthesis  filters,  |Gs(e^‘^)|  =  |cos(w/2)|  and  \Hs{e^‘^)\  = 
|sin(w/2)|  are  shown  in  Fig.  11.2(a).  Gs{z)  satisfies  all  the  conditions  of  Theorem  11.5.  In  this  case  we  can 
evaluate  the  infinite  products  for  #(w)  and  ^'(w)  explicitly  by  using  the  identity  n~=i  cos(2-'”a;)  =  sin  u/uj. 
The  resulting  <?i(t)  and  ipit)  are  as  shown  in  Fig.  11.2  (b)  and  (c).  These  are  precisely  the  functions  that 
generate  the  Haar  orthonormal  basis.  ^ 


(a) 


(b) 


1 


1 


Scaling  function 


t 


1 

w 

(t) 

Haar  wavelet 

0 

t 

Fig.  11.2.  Haar  basis  generated  from  paraunitary  filter  bank. 

(a)  The  synthesis  filters  in  the  paraunitary  filter  bank, 

(b)  the  scaling  function  and  (c)  the  wavelet  function  generated  using  dilation  equation. 
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Example.  11.2.  Paraunitary  filter  bank  which  does  not  give  orthonormal  wavelets.  Consider 
the  filter  bank  with  analysis  filters  GaC^)  =  (1  +  Ha{z)  =  -(1  -  2-'*)/2,  and  synthesis  filters 

Gs{z)  =  (1  +  2~^)/2,  Hs{z)  =  (1  -  z-^)/2.  Since  this  is  obtained  from  the  preceding  example  by  the 
substitution  2  z^,  it  remains  paraunitary  and  satisfies  the  perfect  reconstruction  property.  Gs{z)  satisfies 

all  the  properties  of  Theorem  11.5,  except  the  fourth  condition.  With  (p{t)  and  ip{t)  obtained  from  Gs(e-^“) 
using  the  usual  dilation  equations,  the  functions  {(l>{t  —  ra)}  are  not  orthonormal.  Moreover  the  wavelet 
functions  —  n)}  do  not  form  an  orthonormal  basis  either.  These  statements  can  be  verified  from 

the  sketches  of  the  functions  <j){t)  and  rp{t)  shown  in  Fig.  11.3.  Clearly  4>{t)  and  <f>{t  —  1)  are  not  orthogonal, 
and  and  -  2)  are  not  orthogonal.  In  this  example,  ljP„(a;)||  =  1  for  all  n  whereas  ||$(w)||  =  l/\/3. 
Since  the  limit  of  ||P„(a;)||  does  not  agree  with  ||$(a;)||,  we  conclude  that  $(w)  is  not  the  limit  of  Pn(w). 
The  L2  limit  of  P„(w)  does  not  exist  in  this  example.  “O’ 

(a) 


(b) 


(c) 


(d) 

Fig.  11.3.  Example  11.2.  A  paraunitary  filter  bank  generating  nonorthonormal  {4>(t  -  n)} 

(a)  The  synthesis  filter  response,  (b)  the  scaling  function,  (c)  the  wavelet  function,  and  (d)  a  shifted  version. 
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Thus,  a  paraunitary  filter  bank  may  not  generate  an  orthonormal  wavelet  basis  if  the  fourth  condition 
in  Theorem  11.5  is  violated.  However  this  is  hardly  of  concern  in  practice,  since  any  reasonable  lowpass  filter 
designed  for  a  two  channel  filter  bank  will  be  free  from  zeros  in  the  region  [— O.bjr,  O.Stt]!  In  fact  a  stronger 
result  has  been  proved  by  Cohen  who  derived  necessary  and  sufficient  conditions  for  an  FIR  paraunitary 
filter  bank  to  generate  an  orthonormal  wavelet  basis.  One  outcome  of  Cohen’s  analysis  is  that  the  fourth 
condition  in  Theorem  11.4  can  be  replaced  by  the  even  milder  condition  that  Gs(e-’^)  be  not  zero  in  [-|,  f]. 
In  this  sense  the  condition  for  obtaining  an  orthonormal  wavelet  bcisis  is  trivially  satisfied  in  practice.  The 
case  where  the  fourth  condition  fails  is  primarily  of  theoretical  interest;  a  very  cute  result  in  this  context  is 
Lawton’s  tight  frame  theorem  presented  next. 

11.4.  Wavetet  Tight  Frames 

Even  though  the  wavelet  functions  -  «)}  generated  from  a  paraunitary  filter  bank  may  not  form 

an  orthonormal  basis  (when  the  fourth  condition  of  Theorem  11.5  is  violated),  they  always  form  a  tight 
frame  for  Thus  any  function  can  still  be  expressed  as  an  infinite  linear  combination  of  the  functions 
-  «)}•  More  precisely  we  have  the  following  result  [Lawton,  1990]. 

Theorem  11.6.  Tight  frames  from  paraunitary  filter  banks.  Let  Gs{e^‘^)  =  T,n=o 
be  a  filter  satisfying  the  following  properties. 

1.  G,(e>0)  =  1,  and 

2.  |Gs(e^'“)P  +  |Gs(-e-'")p  =  1  (power-symmetry). 

Then  <f>{t)  G  L^.  Defining  the  wavelet  function  ^(t)  as  usual,  i.e.,  as  in  (10.10),  the  sequence  {2*‘'/^^(2*'t-n)} 

(with  k  and  n  varying  over  all  integers)  forms  a  tight  frame  for  ,  with  frame  bound  unity  (i.e.,  A  =  B  —  1, 

see  Sec.  8).  ^ 

Thus,  the  functions  ipknit)  in  Example  11.2  constitute  a  tight  frame  for  L^.  From  Sec.  8.3  we  know  that 
this  tight  frame  property  means  that  any  x{t)  €  can  be  expressed  as 

OO  OO 

E  E  (a;(t),V'fcn(t))  V'fcn(t),  (11-5) 

fc=-oo  n=-oo 

where  ^fc„(t)  =  2^'/^^(2*’t  -  n).  This  expression  is  pretty  much  like  an  expansion  into  an  orthonormal  basis. 
We  can  find  the  wavelet  coefficients  Cfc„  =  {x{t), ipknit))  exactly  as  in  the  orthonormal  case.  We  also  know 
that  frames  offer  stability  of  reconstruction  (Sec.  8.1).  Thus,  in  every  respect  this  resembles  an  orthonormal 
basis,  with  the  only  difference  that  the  functions  are  not  linearly  independent.  That  is,  there  is  redundancy 
in  the  wavelet  tight  frame  {ipknit)}- 
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12.  COMPACTLY  SUPPORTED  ORTHONORMAL  WAVELETS 

In  Sec.  11  we  showed  how  to  construct  an  orthonormal  wavelet  basis  for  space  by  starting  from  a 
paraunitary  filter  bank.  Essentially  we  defined  two  infinite  products  and  starting  from  the 

digital  lowpass  filter  Gs(e^'“).  Under  some  mild  conditions  on  Gs(e'^’"),  the  products  converge  (Theorem 
11.2).  Under  the  further  condition  that  Gs(e^'“)  be  power  symmetric  and  nonzero  in  [-0.57r,  O.Stt]  we  saw 
that  {d>(t  —  fc)}  forms  an  orthonormal  set,  and  the  corresponding  —  n)}  forms  an  orthonormal 

wavelet  basis  for  (Theorem  11.5).  We  will  now  see  that  if  we  further  constrain  G*(e-'‘^)  to  be  FIR,  that 
jg  9s{n)z-'^,  then  the  scaling  function  <f>{t)  and  the  wavelet  function  ^(t)  have  finite  duration 

[Daubechies,  1988,  1992]. 

Theorem  12.1.  Let  Gsiz)  =  E^=o  with  G.(e^«)  =  1  and  Lf.(e^-)  =  e^“G:(-e^-).  Define 

the  infinite  products  as  in  (ll.la),(ll.lb),  and  assume  that  the  limits  $(w)  and  ’F(u;)  are  functions  (for 
example,  by  imposing  power  symmetry  condition  on  Gs{z)  as  in  Theorem  11.3).  Then  and  ^(t)  (the 

inverse  FTs)  are  compactly  supported,  with  support  in  [0,  W].  ^ 

The  time-decay  of  the  wavelet  V'(0  is  therefore  excellent.  In  particular,  all  the  basis  functions 
n)  are  compactly  supported.  By  further  restricting  the  lowpass  filter  Gs{z)  to  have  sufficient  number  of  zeros 
at  w  =  TT,  we  will  also  ensure  (Sec.  13)  that  the  Fourier  transform  ^(u>)  has  excellent  decay  (equivalently 
^(f)  is  regular  or  smooth  in  the  sense  to  be  quantified  in  Sec.  13). 

The  rest  of  this  section  is  devoted  to  the  technical  details  of  the  above  result.  The  reader  not  interested 
in  these  details  can  skip  to  Sec.  13  without  loss  of  continuity.  The  Theorem  might  seem  “obvious”  at 
first  sight,  and  indeed  a  simple  engineering  argument  based  on  Dirac  delta  functions  can  be  given  (p.  521 
[Vaidyanathan,  1993]).  However  the  correct  mathematical  justification  relies  on  a  number  of  deep  results  m 
function  theory.  One  of  these  is  the  celebrated  Pcdey- Wiener  theorem  for  bandlimited  functions. 

The  Paley- Wiener  Theorem 

A  beautiful  result  in  the  theory  of  signals  is  that  if  an  L"^  function  f{t)  is  bandlimited,  that  is  F{uj)  = 
0,  )a;|  >  cr,  then  f{t)  is  the  “real-axis  restriction  of  an  entire  function.”  Let  us  first  explain  the  meaning  of 
this  phrase.  We  say  that  a  function  /(s)  of  the  complex  variable  s  is  entire  if  it  is  analytic  for  all  s.  Examples 
are  polynomials  in  s,  exponentials  such  as  e®,  and  simple  combinations  of  these.  The  function  f{t)  obtained 
from  /(s)  for  real  values  of  s  (s  =  t)  is  the  real-axis  restriction  of  /(s). 

Thus  if  f(t)  is  a  bandlimited  signal  then  there  exists  an  entire  function  /(s)  such  that  its  real  axis 
restriction  is  f{t).  In  particular,  therefore  a  bandlimited  function  /(t)  is  continuous  and  infinitely  differen¬ 
tiable  with  respect  to  the  time  variable  t.  The  entire  function  f{s)  associated  with  the  bandlimited  function 
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has  the  further  property  l/(s)|  <  ce'^l^l  for  some  c>  0.  We  express  this  by  saying  that  f{s)  is  exponentially 
bounded  or  of  the  exponential  type.  What  is  even  more  interesting  is  that  the  converse  of  this  result  is  true, 
that  is,  if  /(s)  is  an  entire  function  of  the  exponential  type,  and  the  real  axis  restriction  f{t)  is  in  then 
/(f)  is  bandlimited.  By  interchanging  the  time  and  frequency  variables  we  can  obtain  similar  conclusions  for 
time-limited  signals;  this  is  what  we  need  in  the  discussion  of  time  limited  (compactly  supported)  wavelets. 

Theorem  12.2  (Paley-Wiener).  Let  W(s)  be  an  entire  function  such  that  (a)  for  all  s,  we  have 
|W(s)|  <  cexp(v4|s|)  for  some  c,  A  >  0,  and  (b)  the  real  axis  restriction  W{uj)  is  in  L^.  Then  there  exists  a 
function  w{t)  in  such  that  W(s)  =  w'(*)e~-’^df. 


A  proof  can  be  found  in  [Rudin,  1966].  Thus  w{t)  can  be  regarded  as  a  compactly  supported  function 
with  support  in  [-A,A].  Recall  [Eq.  (6.1)]  that  L^[-AA]  C  L^-A,A],  so  w^t)  is  in  L^[-AA]  and 
L^[-A,A].  So  W{lj)  is  the  L^-FT  of  w{t),  and  agrees  with  the  I^.FT  a.e. 

Our  aim  is  to  show  that  the  infinite  product  for  $(w)  satisfies  the  conditions  of  the  Paley-Wiener 
theorem,  and  therefore  that  <j){t)  is  compactly  supported.  A  modified  version  of  the  above  result  is  more 
convenient  for  this.  The  modification  allows  the  support  to  be  more  general,  namely  [-^1,^2],  and  permits 
us  to  work  with  the  imaginary  part  of  s  rather  than  the  absolute  value. 


Theorem  12.3  (Paley-Wiener,  modified).  Let  W(s)  be  an  entire  function  such  that 

f  Cl  exp^Aijlm  s|^,  Im  s  >  0 


I  C2exp(  A2|Im  s]j,  Im  s  <  0 

for  some  c\,C2,Ai,A2  >  0,  and  such  that  the  real  a.xis  restriction  W{oj)  is  in  L?.  Then  there  exists  a  function 
w{t)  in  such  that  W{s)  -  w{t)e-A^dt.  We  can  regard  W{uj)  as  the  Fourier  transform  of  the  function 

■u;(f)  supported  in  [—A2,>1i].  ^ 


(12.1) 


C2exp 


Im  s  <  0 


This  result  can  be  made  more  general;  the  condition  (12.1)  can  be  replaced  with  one  where  the  right 
hand  sides  have  the  form  Pi{s)  exp(A;|Im  sj)  where  P,(s)  are  polynomials.  We  are  now  ready  to  sketch  the 

proof  that  0(f)  and  0(f)  have  the  compact  support  [0,iV].  ^ 

1.  Using  the  fact  that  G,{z)  is  FIR  and  that  G^e^®)  =  1,  show  that  the  product  n^i  G.(eJ"/2  )  converges 

uniformly  on  any  compact  set  of  the  complex  s-plane.  (For  real  s,  namely  s  =  w  this  holds  even  for  the 
HR  case  as  long  as  |np5(n)|  converges.  This  was  shown  in  Theorem  11.2.) 

2.  Uniformity  of  convergence  of  the  product  guarantees  that  its  limit  #(s)  is  an  entire  function  of  the 
complex  variable  s  (Theorem  10.28,  [Rudin,  1966]). 

3.  The  FIR  nature  of  Gs{z)  allows  us  to  establish  the  exponential  bound  (12.1)  for  $(s)  with  A2  =  0  and 
Ai  =  N.  This  shows  that  0(f)  is  compactly  supported  in  [0, -V].  Since  0(f)  is  obtained  from  the  dilation 
equation  (10.10),  the  same  result  follows  for  0(f)  as  well. 


98 


13.  WAVELET  REGULARITY 


From  the  preceding  section  we  know  that  if  we  construct  the  power-symmetric  FIR  filter  Gs{z)  properly,  then 
we  can  get  an  orthonormal  multiresolution  basis  {<^{t—n)},  and  an  orthonormal  wavelet  basis  n)} 

for  the  L^.  Both  of  these  bases  are  compactly  supported.  These  are  solutions  to  the  two-scale  dilation 
equations 

N 

4>{t)  =  2'^gs{n)(l>{2t-n),  (13.1) 

n=0 

N 

i;{t)  =  2Y,hs{n)4>i2t-n),  (13.2) 

n=0 

where  hs{n)  =  (-l)"'*'^5s(“”  ”  !)•  frequency  domain  we  have  the  explicit  infinite  product  expressions 

(11.1)  connecting  the  filters  Gs{z)  and  Hs{z)  to  the  Fourier  transforms  $(w)  and  ^'(u;). 

Fig.  13.1(a)  shows  two  cases  of  a  9th  order  FIR  filter  Gs(e-’“)  used  to  generate  the  compactly  supported 
wavelet.  The  resulting  wavelets  are  shown  in  Figs.  13.1(b)  and  (c).  In  both  cases,  all  conditions  of  Theorem 
11.5  are  satisfied  so  we  obtain  orthonormal  wavelet  bases  for  L^.  The  filter  Gs{e^‘^)  has  more  zeros  at  tt  for 
Case  2  than  for  Case  1.  The  corresponding  wavelet  looks  much  smoother  or  “regular”  —  this  is  an  example 
of  a  Daubechies  wavelet  (Sec.  13.4).  It  turns  out  that,  by  designing  Gs{z)  to  have  a  sufficient  number  of 
zeros  at  tt  we  can  make  the  wavelet  “as  regular  as  we  please.”  A  quantitative  discussion  of  the  connection 
between  the  number  of  zeros  at  tt  and  the  smoothness  of  ^(t)  will  be  given  in  the  following  sections. 

Qualitatively,  the  idea  is  this.  If  Gs(e-’")  has  a  large  number  of  zeros  at  tt,  then  the  function  #(w) 
given  by  the  infinite  product  (11.1a)  decays  “fast”  as  w  — >  oo.  This  fast  asymptotic  decay  in  the  frequency 
domain  implies  that  the  time  function  (t>{t)  is  “smooth”.  And  since  ‘>p{t)  is  derived  from  ^(t)  using  a  finite 
sum  (13.2),  the  smoothness  of  4>{t)  is  transmitted  to  V’(f).  In  the  next  few  sections  we  will  make  the  ideas 
more  quantitative.  References  for  this  section  include  [Daubechies  and  Lagarias,  1991],  [Daubechies,  1992] 
and  [Rioul,  1992]. 

Why  regularity? 

The  point  made  above  was  that  if  we  design  an  FIR  paraunitary  filter  bank  with  the  additional  constraint 
that  the  lowpass  filter  Gs{e^‘^)  have  a  sufficient  number  of  zeros  at  tt,  the  wavelet  basis  functions  ^fcn(0  are 
sufficiently  smooth.  The  smoothness  requirement  is  perhaps  the  main  new  component  brought  into  the  filter 
bank  theory  from  the  wavelet  theory.  Its  importance  can  be  understood  in  a  number  of  ways. 
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Fig.  13.1.  Demonstrating  the 
of  the  FIR  filter  Gs(2)  for  two  cases, 


'),  (c)  the  corresponding  wavelet 


Consider  the  expansion  x{t)  =  J2k,n  Ck,ri-’Pknit).  Suppose  we  truncate  this  to  a  finite  number  of  terms 
as  is  often  done  in  practice.  If  the  basis  functions  are  not  smooth,  then  the  error  can  produce  perceptually 
annoying  effects  in  applications  such  as  audio  and  image  coding,  even  though  the  norm  of  the  error  might 
be  small. 

Consider  next  a  tree  structured  filter  bank.  An  example  is  shown  in  Fig.  4.6.  In  the  synthesis  bank, 
the  first  path  can  be  regarded  as  an  effective  interpolation  filter,  that  is,  an  expander  (e.g.,  T  8  in  Fig. 
4.6)  followed  by  a  filter  of  the  form  . . .  Gs{e‘^''^‘^)-  This  same  finite  product  can  be 

obtained  by  truncating  to  i  +  1  terms  the  infinite  product  defining  $(a;)  [Eq.  (11.1)],  and  making  a  change 
of  variables.  Similarly  the  remaining  paths  can  be  related  to  interpolation  filters  which  are  various  truncated 
versions  of  the  infinite  product  defining  ^'(w)  in  Eq.  (11.1).  Imagine  now  that  we  use  the  tree  structured 
system  in  subband  coding.  The  quantization  error  in  each  subband  is  filtered  through  an  interpolation  filter. 
If  the  impulse  response  of  the  interpolation  filter  is  not  smooth  enough  (e.g.,  if  it  resembles  Fig.  13.1(b)), 
then  the  filtered  noise  tends  to  show  severe  perceptual  effects,  for  example  in  image  reconstruction.  This 
explains,  qualitatively,  the  importance  of  having  “smooth  impulse  responses”  for  the  synthesis  filters. 

13.1.  Smoothness  and  Holder  Regularity  Index 

We  are  familiar  with  the  notion  of  continuous  functions.  We  say  that  f{t)  is  continuous  at  to  if,  for  ^^y 
e  >  0  we  can  find  a  6  >  0  such  that  \f{t)  -  f{to)\  <  e  for  all  t  satisfying  |t  -  to]  <  A  stronger  type 
of  continuity,  called  Holder  continuity,  is  defined  as  follows:  f{t)  is  Holder  continuous  in  a  region  S  if 
_  /(ij)|  <  c|to  -  ti|^  for  some  c,/3  >  0,  for  all  to,ti  €  S.  This  implies,  in  particular,  continuity  in 
the  ordinary  sense.  If  /?  >  1  the  above  would  imply  that  f{t)  is  constant  on  S.  For  this  reason,  we  have 
the  restriction  0  <  ^  <  1.  As  /?  increases  from  0  to  1,  the  function  becomes  “smoother  and  smoother”.  The 
constant  /?  is  called  the  Lipschitz  constant  of  the  function  J{t). 

Suppose  the  function  f{t)  is  n  times  differentiable  in  some  region  S  and  the  nth  derivative  (t)  is 
Holder  continuous  with  Lipschitz  constant  /?.  Define  a  =  n  +  /?.  We  say  that  f[t)  belongs  to  the  class  C“. 
The  coefficient  a  is  called  the  Holder  regularity  index  of  f{t).  For  example,  is  the  class  of  functions 
that  are  three  times  differentiable  and  the  third  derivatives  are  Holder  continuous  with  Lipschitz  constant 
equal  to  0.4. 

The  Holder  regularity  index  a  is  taken  as  a  quantitative  measure  of  regularity  or  smoothness  of  the 
function  ^(t).  We  sometimes  say  ^(t)  has  regularity  a.  Qualitatively  speaking,  a  function  with  a  large 
Holder  index  is  regarded  as  more  “smooth”  or  “well-behaved”.  Since  the  dilation  equations  in  the  FIR  case 
are  finite  summations,  the  Holder  indices  of  ^(f)  and  rp(t)  are  identical. 
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There  exist  functions  which  are  differentiable  infinite  number  of  times.  That  is,  they  belong  to  C  . 
Examples  are  e‘,  sint,  and  polynomials.  There  even  exist  C°°  functions  that  are  compactly  supported  (i.e., 
have  finite  duration);  however  we  will  not  have  occassion  to  encounter  these. 

13.2.  Frequency-Domain  Decay  and  Time-Domain  Smoothness 

We  can  obtain  time-domain  smoothness  of  a  certain  degree  by  imposing  certain  conditions  on  the  Fourier 
transform  ^(w).  This  is  made  possible  by  the  fact  that  the  rate  of  decay  of  >^(0;)  as  a;  ^  oo  (i.e.,  the 
asymptotic  decay)  governs  the  Holder  regularity  index  a  of  Suppose  ^'(w)  decays  faster  than  (1  + 

|w|)-(i+“).  That  is, 

for  some  c  >  0,e  >  0.  Then  'F(a;)(l  -1-  |a;|)“  is  bounded  by  the  integrable  function  c/(l  -F  |wl)^+%  and  is 
therefore  (Lebesgue)  integrable.  It  can  be  shown  using  standard  Fourier  theory  that  this  implies  tp{t)  £  C  . 
In  the  wavelet  construction  of  Sec.  11  which  begins  with  a  digital  filter  bank,  the  above  decay  of  4'(uj)  can 
be  accomplished  by  designing  the  digital  filter  Gs{e^'^)  such  that  it  has  a  sufficient  number  of  zeros  at  w  tt 
(Sec.  13.4). 

Thus,  the  decay  in  the  frequency  domain  translates  into  regularity  in  the  time  domain.  Similarly  one 
can  regard  time-domain  decay  as  an  indication  of  smoothness  in  frequency.  When  comparing  two  kinds  of 
wavelets,  we  can  usually  compare  them  in  terms  of  time  domain  regularity  (frequency  domain  decay)  and 
time  domain  decay  (frequency  domain  smoothness).  An  extreme  example  is  where  4){t)  is  bandlimited.  This 
means  that  ^(lj)  is  zero  outside  the  pcissband,  and  so  the  “decay”  is  the  best  possible.  Correspondingly 
the  smoothness  of  tp{t)  is  excellent;  in  fact  i'(t)  €  However,  the  decay  of  ^(t)  may  not  be  excellent 
(certainly  it  cannot  be  time  limited  if  it  is  band  limited). 

Return  to  the  two  familiar  wavelet  examples,  namely  the  Haar  wavelet  (Fig.  2.12)  and  the  bandpass 
wavelet  (Figs.  2.9,  2.11).  We  see  that  the  Haar  wavelet  has  poor  decay  in  the  frequency  domain  since 
^(a;)  decays  only  as  Correspondingly  the  time  domain  signal  ^(t)  is  not  even  continuous,  hence  not 

differentiable. t  The  bandpass  wavelet  on  the  other  hand  is  band  limited  (so  the  decay  in  frequency  is 
excellent).  Thus  V'(i)  €  C°°,  but  it  decays  slowly,  behaving  like  for  large  t.  These  two  examples  represent 
two  extremes  of  orthonormal  wavelet  bases  for  . 

The  game,  therefore,  is  to  construct  wavelets  that  have  good  decay  in  time  as  well  as  good  regularity  in 
time.  An  extreme  hope  is  where  ^(t)  £  C°°  and  has  compact  support  as  well.  It  can  be  shown  that  such 

t  It  is  true  that  ^(t)  is  differentiable  almost  everywhere.  But  the  discontinuities  at  the  points  t  =  0, 0.5, 1.0 
will  be  very  noticable  if  we  take  linear  combinations  like  „  Cfcnti’fcn(t)- 
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■tpit)  can  never  give  rise  to  an  orthonormal  basis  (see  Sec.  13.3  for  more  precise  statement)  so  we  have  to 
strike  a  compromise  between  regularity  in  time  and  decay  in  time. 

Regularity  and  Decay  in  Early  Wavelet  Constructions 

In  1982  Stromberg  showed  how  to  construct  wavelets  such  that  ip{t)  has  exponential  decay,  and  at  the 
same  time  has  arbitrary  regularity  (i.e.,  ^(t)  €  C*  for  any  chosen  integer  k).  In  1985  Meyer  constructed 
wavelets  with  bandlimited  ‘ip{t)  (so  ‘ip{t)  €  C°°  as  for  the  bandpass  wavelet),  but  he  also  showed  how  to 
design  this  V'(t)  to  decay  faster  than  any  chosen  inverse  polynomial,  as  t  oo.  Figure  13.2(a)  shows  an 
example  of  a  Meyer  wavelet;  a  detailed  description  of  this  wavelet  can  be  found  in  [Daubechies,  1992].  In 
both  of  the  above  constructions,  the  wavelets  gave  rise  to  orthonromal  bases  for 


Fig.  13.2.  (a)  An  example  of  the  Meyer  wavelet,  and 
(b)  an  example  of  the  Battle- Lemarie  wavelet. 

In  1987  and  1988,  Battle  and  Lemarie  constructed,  independently,  wavelets  with  similar  properties  as 
Stromberg’s  wavelets,  namely  xp{t)  €  C'‘  for  arbitrary  fc,  and  ip{t)  decays  exponentially.  Their  construction  is 
based  on  spline  functions  and  an  orthonormalization  step,  as  described  in  Sec.  10.5.  The  resulting  wavelets, 
while  not  compactly  supported,  decay  exponentially  and  generate  orthonormal  bases.  Fig.  13.2(b)  shows  an 
example  of  the  Battle-Lemarie  wavelet. 

Table  13.1  gives  a  summary  of  the  main  features  of  these  early  wavelet  constructions  (first  three  entries). 
When  these  examples  were  constructed,  the  relation  between  wavelets  and  digital  filter  banks  was  not  known. 
The  constructions  were  not  systematic,  or  unified  by  a  central  theory.  Moreover  it  was  not  clear  whether 
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one  could  get  a  compactly  supported  (i.e.,  finite-duration)  wavelet  which  at  the  same  time  has  arbitrary 
regularity  (i.e.,  ^(t)  €  C*  for  any  chosen  k),  and  generates  an  orthonormal  wavelet  basis.  This  was  made 
possible  for  the  first  time  when  the  relation  between  wavelets  and  digital  filter  banks  was  observed  by 
Daubechies  [1988].  Simultaneously  and  independently  Mallat  invented  the  multiresolution  framework  and 
observed  the  relation  between  his  framework,  wavelets,  and  paraunitary  digital  filter  banks  (the  CQF  bank, 
Sec.  4).  These  discoveries  have  made  the  wavelet  construction  easy  and  sytematic,  as  described  earlier  in 
Sec.  11-12.  The  way  to  obtain  arbitrary  wa%'elet  regularity  with  this  scheme  is  described  next. 


Type  of 
wavelet 

Decay  of 
\|/  (t)  in  time 

Regularity  of 
\}f(t)  in  time 

Type  of 
wavelet  basis 

Stromberg, 

1982 

Exponential 

k 

\l/(t)inC  , 
k  can  be  chosen 
arbitrarily  large 

Orthonormal 

Meyer, 

1985 

Faster  than 
any  chosen  inverse 
polynomial 

\};(t)  m  C 
(bandlimited) 

Orthonormal 

Battle-Lemarie, 
1987,  88 
(SPLINES) 

Exponential 

k 

k  can  be  chosen 
arbitrarily  large 

Orthonormal 

Daubechies, 

1988 

Compactly 

supported 

\|/(t)  in  C  , 
a  can  be  chosen 
as  large  as  we  please 

Orthonormal 

Table  13.1.  Summary  of  several  types  of  wavelet  bases  for  L^{71). 


13.3.  Time-Domain  Decay  and  Time-Domain  Regularity 

We  will  now  state  a  fundamental  limitation  which  arises  when  trying  to  impose  regularity  and  decay  simul¬ 
taneously  [Daubechies,  1992]. 

Theorem  13.1.  Vanishing  Moments.  Let  -  n)},-oo  <k,n<cohe  an  orthonormal  set 

in  L^.  Suppose  the  wavelet  ^(t)  satisfies  the  following  properties. 

1  |^(^)|  <  c(l  -1-  for  some  integer  m  and  some  e  >  0.  That  is,  the  wavelet  decays  faster  than 
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2.  -ipit)  6  C""  (i.e.,  i){t)  differentiable  m  times),  and  the  m  derivatives  are  bounded. 
Then  the  first  m  moments  of  fp{t)  are  zero,  that  is,  J  =  0  for  0  <  i  <  m. 


0 

Impossibility  of  Compact  Support,  Infinite  Differentiability,  and  Orthonormality.  Suppose 
we  have  an  orthonormal  wavelet  basis  such  that  rp(t)  is  compactly  supported,  and  infinitely  differentiable 
(i.e.,  4>{t)  €  C°°).  Then  all  the  conditions  of  Theorem  13.1  are  satisfied.  So  the  moments  of  ip(t)  are 
zero,  and  therefore  V'(t)  =  0  for  all  t  violating  the  unit-norm  property  of  We  cannot,  therefore,  design 
compactly  supported  orthonormal  wavelets  which  are  infinitely  differentiable;  only  a  finite  Holder  index  can 
be  accomplished.  A  similar  observation  can  be  made  even  when  'ip{t)  is  not  compactly  supported  as  long  as 
it  decays  faster  than  any  inverse  polynomial  (e.g.,  exponential  decay). 

The  vanishing  moment  condtion  J  =  0, 0  <  f  <  m.  implies  that  the  T  Fourier  transform  ’F(w) 

has  m  -1- 1  zeros  at  w  =  0.  This  follows  by  using  standard  theorems  on  the  i^-FT  [Rudin,  1966].^  Thus,  the 
first  m  derivatives  of  ^  (u>)  vanish  at  a;  =  0.  This  implies  a  certain  degree  of  flatness  at  w  =  0.  Summarizing, 
we  have: 

Theorem  13.2.  Flatness  in  Frequency  and  Regularity  in  Time.  Suppose  we  have  a  compactly 
supported  ipit)  generating  an  orthonormal  wavelet  basis  -  n)},  and  let  €  C"",  with  m 

derivatives  bounded.  Then  $(a;)  has  m  -f- 1  zeros  at  u;  =  0.  ^ 

Return  now  to  the  wavelet  construction  technique  described  in  Sec.  11.  We  started  from  a  paraunitary 
FIR  filter  bank  (Fig.  4.1(a))  and  obtained  the  scaling  function  and  wavelet  function  ip{t)  as  in  (13.1) 
and  (13.2).  The  FIR  nature  implies  that  V’(f)  has  compact  support  (Sec.  12).  With  the  mild  conditions  of 
Theorem  11.5  satisfied,  we  have  an  orthonormal  wavelet  basis  for  L^.  We  see  that  if  the  wavelet  has 
Holder  index  a,  then  it  satisfies  all  the  conditions  of  Theorem  13.2  where  m  is  the  integer  part  of  a.  Thus 
$  (w)  has  m  +  1  zeros  at  a;  =  0.  But  since  $(0)  9^  0  (Section  10.4),  we  conclude  from  the  dilation  equation 
^(w)  =  that  the  highpass  FIR  filter  Hs{z)  has  m  +  1  zeros  at  w  =  0  (i.e.,  at  r  =  1).  Using 

the  relation  ffs(e^'")  =  e^'“G*(-e-’'“)  we  conclude  that  Gs(e-'‘^)  has  m -I- 1  zeros  at  u;  =  tt.  That  is,  the  lowpass 
FIR  filter  Gs{z)  has  the  form  G,(.-)  =  (1  +  z-^)^+^F(z)  where  F{z)  is  FIR.  Summarizing,  we  have: 

Theorem  13.3.  Zeros  at  tt  and  regulcirity.  Suppose  we  wish  to  design  a  compactly  supported 
orthonormal  wavelet  basis  for  by  designing  an  FIR  filter  Gs(2)  satisfying  the  conditions  of  Theorem 
11.5.  If  ^(t)  has  to  have  the  Holder  regularity  index  a  then  it  is  necessary  that  Gs(z)  have  the  form 
Gs(z)  =  (1  +  2“^)”''^^ F(z}  where  F(z)  is  FIR,  and  m  is  the  integer  part  of  a.  O’ 

One  zero  at  tt  is  essential.  From  Theorem  11.2  we  know  that  we  need  to  have  Gs(e'’'°)  =  1  for  the 
t  Since  ip{t)  G  and  has  compact  support,  ^(t)  €  as  well. 
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infinite  product  (11.1a)  to  converge.  Theorem  11.5  imposes  further  conditions  which  enables  us  to  obtain 
an  orthonormal  wavelet  basis  for  L^.  One  of  these  conditions  is  the  power  symmetric  property  |Gs(eJ“)p  + 
|G3(_e-f")p  =  1.  Together  with  Gs(e-''“)  =  1  this  implies  Gs(e-’’^)  =  0.  Thus,  it  is  necessary  to  have  at 
least  one  zero  of  at  tt.  The  filter  which  generates  the  Haar  basis  (Example  11.1)  has  exactly  one 

zero  at  tt.  But  the  Haar  wavelet  ip{t)  is  not  even  continuous.  If  we  desire  increased  regularity  (continuity, 
differentiability  . . .),  we  need  to  put  additional  zeros  at  ir,  as  the  above  theorem  shows. 

Design  techniques  for  paraunitary  filter  banks  do  not  automatically  yield  filters  which  have  zeor(s)  at 
TT.  This  condition  has  to  be  incorporated  separatel}'.  The  maximally  flat  filter  bank  solution  (Sec.  4.6)  does 
satisfy  this  property,  and  in  fact  even  allows  us  to  specify  the  number  of  zeros  at  tt. 

13.4.  Wavelets  With  Specified  Regularity 

The  fundamental  connection  between  digital  filter  banks  and  continuous  time  wavelets,  elaborated  in  the 
preceding  sections,  allows  us  to  construct  the  scaling  function  4>{t)  and  the  wavelet  function  xp{t)  with 
specified  regularity  index  a.  If  Gsi^)  has  a  certain  number  of  zeros  at  tt,  this  translates  into  the  Holder 
regularity  index  a.  We  will  see  that  what  really  matters  is  not  only  the  number  of  zeros  at  tt,  but  also  the 
order  of  the  FIR  filter  Gs(2). 

For  a  given  order  N  of  the  filter  Gs{z),  suppose  we  wish  to  put  as  many  of  its  zeros  as  possible  at  tt. 
Let  this  number  be  K.  What  is  the  largest  possible  A'?  We  can’t  have  all  N  zeros  at  tt  because  we  have 
imposed  the  power  symmetric  condition  on  Gs{z).  The  best  we  can  do  is  to  put  all  the  unit-circle  zeros  at 
TT.  The  power  symmetric  condition  says  that  G{z)=Gs{z)Gs{z)  is  a  half-band  filter.  This  filter  has  order 
2iV,  with  2K  zeros  at  tt.  Since  we  wish  to  maximize  K  for  fixed  iV,  the  solution  for  G(z)  is  the  maximally 
flat  FIR  filter  (Fig.  4.5),  given  in  (4.23).  As  the  filter  in  (4.23)  has  2K  zeros  at  tt  and  order  27V  =  4A'  -  2 
we  conclude  that  A"  =  (TV  -f- 1)/2.  For  example  if  Gs{z)  is  a  fifth  order  power  symmetric  filter  it  can  have  at 
most  three  zeros  at  tt. 

The  20%  Regularity  Rule 

Suppose  Gs(z)  has  been  designed  to  be  FIR  power  symmetric  of  order  TV,  with  the  number  A  of  zeros  at 
TT  adjusted  to  be  maximum  (i.e.,  K  =  (TV  -1-  l)/2).  Then  it  can  be  shown  that  the  corresponding  scaling  and 
wavelet  functions  have  a  Holder  regularity  index  a  w  0.2K.  This  approximate  estimate  is  poor  for  small  K 
but  improves  as  A'  grows.  Thus  every  additional  zero  at  tt  contributes  to  «  20%  improvement  in  regularity. 

For  A'  =  4  (i.e.,  7th  order  Gs(z))  we  have  a  =  1.275  which  means  that  the  wavelet  ^(t)  is  once 
differentiable  and  the  derivative  is  Holder  continuous  with  Lipschitz  constant  0.275.  For  K  =  10  (19th  order 
Gs{z))  we  have  a  =  2.9,  so  the  wavelet  ^(t)  is  twice  differentiable  and  the  second  derivative  has  Holder 
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regularity  index  0.9. 

Design  procedure.  The  design  procedure  is  therefore  very  simple.  For  a  specified  regularity  index 
Q,  we  can  estimate  K  and  hence  N  =  2Ii  —  1.  For  this  K,  we  compute  the  coefficients  of  the  FIR  half 
band  maximally  flat  filter  G{z)  using  (4.23).  From  this  we  compute  a  spectral  factor  Gs{z)  of  the  filter 
G{z).  Tables  of  the  filter  coefficients  gs{n)  for  various  values  of  N  can  be  found  in  [Daubechies,  1992].  From 
the  coefficients  (/s(n.)  of  the  FIR  filter  Gs{z),  the  compactly  supported  scaling  and  wavelet  functions  are 
fully  determined  via  the  dilation  equations.  These  wavelets  are  called  Daubechies  wavelets  and  were  first 
generated  in  [Daubechies,  1988].  Fig.  13.1(c)  is  an  example,  generated  with  a  9th  order  FIR  filter  Gs(z), 
whose  response  is  shown  as  Case  2  in  Fig.  13.1(a). 

The  above  regularity  estimates,  based  on  frequency  domain  behavior,  give  a  single  number  a  that 
represents  the  regularity  of  xp{t)  for  all  t.  It  is  also  possible  to  define  pointwise  or  local  regularity  of  the 
function  ip{t)  so  that  its  smoothness  can  be  estimated  as  a  function  of  time  t.  These  estimation  methods, 
based  on  time  domain  iterations,  are  more  sophisticated  but  give  a  detailed  view  of  the  behavior  of  ip{t). 
Detailed  discussions  on  obtaining  various  kinds  of  estimates  for  regularity  can  be  found  in  [Daubechies  and 
Lagarias,  1991],  [Daubechies,  1992]  and  [Rioul,  1992]. 

14.  CONCLUDING  REMARKS 

We  introduced  the  wavelet  transform,  and  studied  its  connection  to  filter  banks  and  short  time  Fourier 
transforms.  A  number  of  mathematical  concepts  such  as  frames  and  Riesz  bases  were  reviewed  and  used 
later  for  a  more  careful  study  of  wavelets.  IVe  introduced  the  idea  of  multiresolution  analysis,  and  explained 
the  connections  both  to  filter  banks  and  wavelets.  This  connection  was  then  used  to  generate  orthonormal 
wavelet  bases  from  paraunitary  filter  banks.  Such  wavelets  have  compact  support  when  the  filter  bank  is 
FIR.  The  regularity  or  smoothness  of  the  wavelet  was  quantified  in  terms  of  the  Holder  exponent.  We 
showed  that  we  can  achieve  any  specified  Holder  exponent  for  compactly  supported  wavelets  by  restricting 
the  lowpass  filter  of  the  FIR  paraunitary  filter  bank  to  be  a  maximally  fiat  power-symmetric  filtei,  with  a 
sufficient  number  of  zeros  at  tt. 

Why  Wavelets? 

Discussions  comparing  wavelets  with  other  types  of  time  frequency  transforms  appear  at  several  places 
in  this  chapter.  Here  is  a  list  of  these  discussions: 

1.  Sec.  2.5  discusses  basic  properties  of  wavelets,  and  Sec.  2.7  gives  an  elementary  comparison  of  wavelet 

basis  with  the  Fourier  basis. 
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2.  Sec.  3.2  compares  the  wavelet  transform  with  the  short  time  Fourier  transform,  and  shows  the  time 
frequency  tilings  for  both  cases. 

3.  Sec.  9  gives  a  deeper  comparison  with  the  STFT  in  terms  of  stability  properties  of  the  inverse,  existence 
of  frames,  and  so  forth. 

4.  Sec.  13  shows  a  comparison  to  the  traditional  filter  bank  design  approach.  In  traditional  designs,  the 
appearance  of  zero(s)  at  tt  is  not  considered  important.  At  the  beginning  of  Sec.  13  (under  “Why 
regularity”),  we  discuss  the  importance  of  these  zeros  in  wavelets  as  well  as  in  tree  structured  filter 
banks 

Further  reading 

The  literature  on  wavelet  theory  and  applications  is  enormous.  This  chapter  is  only  a  brief  introduction, 
concentrating  on  one  dimensional  orthonormal  wavelets.  There  exist  many  results  on  the  topics  of  multidi¬ 
mensional  wavelets,  biorthogonal  wavelets,  and  wavelets  based  on  HR  filter  banks.  Two  special  issues  of  the 
IEEE  Transactions  have  appeared  on  the  topic  so  far  [IEEE,  1992]  and  [IEEE,  1993],  covering  some  of  these 
topics.  Multidimensional  wavelets  are  treated  by  several  authors  in  the  edited  volume  [Chui  1992b],  and  the 
filter  bank  perspective  can  be  found  in  [Kovacevic  and  Vetterli,  1992].  Advanced  results  on  multidimensional 
wavelets  can  be  found  in  [Cohen  and  Daubechies,  1993],  and  the  theory  of  biorthogonal  wavelets  is  treated 
in  [Cohen,  et  al.,  1992].  Sampling  theorems  for  wavelet  and  multiresolution  subspaces  have  been  introduced 
by  Waiter  [1992],  and  extended  by  other  authors.  See  Djokovic  and  Vaidyanathan  [1994],  and  references 
therein.  Advanced  results  on  wavelets  constructed  from  M-channel  filter  banks  can  be  found  in  the  chapter 
by  Gopinath  and  Burrus  (see  [Chui,  1992b]),  and  in  [Steffen,  et  al.,  1993].  The  reader  can  also  refer  to  the 
collections  of  chapters  in  Chui  [1992b]  and  [Benedetto  and  Frazier,  1994],  and  many  references  therein. 
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APPENDIX  A.  DISTRIBUTIONS  AND  THEIR  FOURIER  TRANSFORMS 


There  are  many  commonly  used  examples  such  as  x{t)  =  1,  which  are  not  in  for  any  finite  p.  For  these  we 
cannot  define  the  Fourier  transform  in  the  sense  of  Sec.  6.3.  However,  in  electrical  sciences  (even  in  physics) 
we  often  make  statements  like 


x{t)  —  1  implies  X{oj)  =  2-k8{im) 


and  x{t)  =  8{t)  implies 


X{(jj)  =  1  for  all  u. 


(A.1) 


The  Dirac  delta  8{t)  is  actually  a  fictitious  function,  assumed  to  be  zero  everywhere  except  at  t  =  0  (where 
it  is  undefined),  and  satisfying  f  6(t)dt  =  1.  It  is  often  regarded  as  the  limit  of  a  sequence  of  functions  fn(t) 
with  J  fn{t)dt  -  1  and  such  that  /„(<)  ->  0  pointwise  for  all  1 0.  If  the  Dirac  delta  were  a  function  in  the 
usual  sense,  its  Lebesgue  integral  would  be  zero  rather  than  one,  since  the  “function”  is  zero  a.e.  There  are 
many  such  mathematical  difficulties  in  dealing  with  Dirac  delta. 

In  mathematics,  the  delta  function  is  regarded  as  a  linear  mapping  that  takes  a  function  s{t)  as  an 
“input”,  and  produces  the  number  s(0)  as  an  output  (compare  with  the  statement  /  s{t)6{t)dt  =  s(0),  which 
can  be  found  in  engineering  texts,  e.g.,  [Oppenheim,  et  al.,  1983]).  Of  course,  we  have  to  define  the  class  of 
allowed  inputs  s(t)  carefully.  For  example,  if  s{t)  is  not  continuous  at  t  =  0  then  the  Dirac  delta  cannot  be 
properly  defined.  With  appropriate  restrictions  on  the  class  of  allowed  “inputs,”  such  mappings  are  called 
distributions  (all  precise  definitions  will  be  given  below). 

The  space  V  from  which  the  inputs  s{t)  are  drawn  is  usually  restricted  to  be  the  set  of  all  functions  which 
have  two  properties:  (a)  compact  support  (finite  duration),  and  (b)  infinite  differentiability  everywhere.  In 
particular  they  are  continuous  everywhere.  We  will  see  below  that  this  allows  us  to  define  “derivatives  of 
distributions,”  and  so  forth.  It  has  been  found  that  a  slightly  larger  class  of  functions  S  D  V  is  more  useful. 
The  class  <S  is  defined  similar  to  V  with  the  exception  that  the  “compact  support”  requirement  is  replaced 
with  the  milder  condition  that  the  functions  decay  faster  than  any  inverse  polynomial  (i.e.,  faster  than  lt|“" 
for  any  integer  n),  as  |t|  00.  It  can  be  shown  that,  while  T>  C  S,  the  set  S  is  itself  smaller  than  L^.  That 

is, 
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Recall  that  the  FT  of  an  L}  function  may  not  be  in  (Sec.  6.3).  Similarly  the  FT  of  a  nonzero  function 
in  V  is  not  in  V,  because  s{t)  and  its  FT  cannot  both  be  of  finite  duration.  However,  the  set  S  sandwiched 
between  V  and  has  this  beautiful  property:  the  i^-FT  of  any  function  in  S  still  belongs  to  S  [Rudm, 
1973].  This  will  be  useful  in  defining  Fourier  transforms  of  tempered  distributions  (see  below). 


Definitions 


1.  A  functional  A[.],  defined  on  a  set  TZ  of  functions,  takes  an  element  s{t)  £  1Z  as  input  and  produces 

a  complex  number  A[s(t)]  as  the  output.  A  linear  functional  is  a  functional  that  satisfies  the  usual 
meaning  of  linearity,  that  is,  A[aiSi(t)  +  ffl2S2(t)]  =  +  ^i2^[52(0]- 

2.  Linear  functionals  on  V  are  called  distributions,  whereas  linear  functionals  on  5  are  called  tempered 

distributions. 


For  example,  if  A  is  the  Dirac  delta  distribution,  A[s(f)]  =  s(0).  Fig.  A.l  shows  a  schematic  of  distributions 
in  general,  and  the  Dirac  delta  in  particular. 


s(t) 

s(t) 


A.  [s(t)]  ,  a  real  or 
complex  number 


s(0) 


Fig.  A.l.  (a)  Schematic  of  a  distribution  A[-]  and 
(b)  the  example  of  delta  function  as  a  distribution. 


Regular  and  irregular  distributions.  Let  x{t)  be  a  locally  integrable  function  (i.e.,  the  Lebesgue 
x{t)dt  exists  for  every  finite  a,  b),  not  growing  faster  than  polynomials  as  |#1  ^  oo.  We  can  use  it  to  define 
a  linear  functional  A  on  <S  (or  D)  as  follows:  for  any  s(t)  €  S  define 

A[s(f)]  =  J  x{t)s{t)dt.  (^-3) 

We  then  say  that  the  distribution  A  is  induced  by  the  function  x{t).  One  can  view  the  Dirac  delta  distribution 
as  being  described  by 

A[s(t)]  =  J  6{t)s{t)dt  =  s(0)  (^-4) 

but  it  is  not  an  induced  distribution  because  6{t)  is  only  a  fictitious  function.  Distributions  that  are  induced 
by  locally  integrable  functions  as  in  (A.3)  are  said  to  be  regular.  Dirac  delta  is  not  a  regular  distribution. 

Fourier  transforms  of  tempered  distributions.  We  define  the  Fourier  transform  of  a  tempered 
distribution  A  as  another  tempered  distribution  A  such  that 

A[s{t)]  =  A[5(w)].  (^-5) 

Thus  the  FT  of  the  distribution  A  is  the  same  distribution  operating  on  S{uj)  rather  than  s{t),  as  shown 
schematically  in  Fig.  A.2.  This  definition  makes  sense  because  the  domain  of  tempered  distributions  is  the 
class  S  and,  as  already  stated,  if  s{t)  €  S  then  5(w)  6  S. 

no 


s(t)  -H  A[-] 


FT  of  X[-] 


=  s(t)  — -I  ft  I 


Fig.  A.2.  The  Fourier  transform  (FT)  of  a 
tempered  distribution,  shown  schematically. 

For  example  let  A  be  the  Dirac  delta  distribution.  Then 


A[s(t)]  =  A[S(w)]  =  5(0)  =  J  s{v)dv. 


The  extreme  right  hand  side  represents  a  regular  distribution  induced  by  the  locally  integrable  function 
X(w)  =  1.  (Note  that  v  in  (A.6)  is  just  a  dummy  variable  of  integration;  you  can  replace  it  with  u)).  So 
we  say  that  the  FT  of  the  Dirac  delta  distribution  is  the  constant  function  (X(a;)  =  1  everywhere).  What 
we  really  mean  is  that  the  FT  of  the  Dirac  delta  distribution  A  is  the  distribution  A  induced  by  the  constant 
function.  Fig.  A.3  gives  a  schematic  summary  of  this. 

s(t)  -^5(-)  s(0) 

Dirac  delta  distribution 


s(t)  — •!  A  ( •)  [— 

FT  of  Dirac 
delta  distribution 


=  s(t)  — -  ft  5(‘)  S(0)=y^s(t)dt 

=  s(t)  —  J  dt  j  s(t)dt 


Fig.  A.3.  (a)  The  Dirac  delta  distribution,  and 
(b)  its  Fourier  transform,  shown  schematically. 

By  a  dual  argument  we  can  show  that  the  FT  of  the  constant  function  is  the  delta  function.  Again  the 
real  meaning  of  this  statement  is:  the  FT  of  the  regular  distribution  induced  by  the  constant  function  is  the 
Dirac  delta  distribution.  We  can  give  similar  meaning  to  the  FT  of  a  polynomial,  and  conversely  interpret 
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a  polynomial  as  the  FT  of  a  distribution.  Example:  the  FT  of  the  polynomial  t  is  the  distribution  that 
extracts  derivatives  at  the  origin  (why?). 

Many  other  operations  on  distributions  can  be  defined  similarly.  For  example  the  derivative  of  a  distri¬ 
bution  is  the  distribution  of  the  derivative  (except  for  a  sign):  A'[s(t)]  =  -A[s'(t)],  as  schematically  shown  in 
Fig.  A.4.  This  definition  opens  up  a  whole  theory  of  calculus  for  distributions,  but  we  will  not  require  them 
here.  An  introduction  to  distributions  can  be  found  in  [Kolmogorov  and  Fomin,  1970].  Advanced  results 
can  be  found  in  [Rudin,  1973]. 


Derivative  of 
the  distribution  A[’] 


Fig.  A.4.  The  derivative  of  a  distribution  A]-]. 
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